• Title/Summary/Keyword: Robust Statistics

Search Result 397, Processing Time 0.025 seconds

Bankruptcy prediction using ensemble SVM model (앙상블 SVM 모형을 이용한 기업 부도 예측)

  • Choi, Ha Na;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1113-1125
    • /
    • 2013
  • Corporate bankruptcy prediction has been an important topic in the accounting and finance field for a long time. Several data mining techniques have been used for bankruptcy prediction. However, there are many limits for application to real classification problem with a single model. This study proposes ensemble SVM (support vector machine) model which assembles different SVM models with each different kernel functions. Our ensemble model is made and evaluated by v-fold cross-validation approach. The k top performing models are recruited into the ensemble. The classification is then carried out using the majority voting opinion of the ensemble. In this paper, we investigate the performance of ensemble SVM classifier in terms of accuracy, error rate, sensitivity, specificity, ROC curve, and AUC to compare with single SVM classifiers based on financial ratios dataset and simulation dataset. The results confirmed the advantages of our method: It is robust while providing good performance.

Adaptive stochastic gradient method under two mixing heterogenous models (두 이종 혼합 모형에서의 수정된 경사 하강법)

  • Moon, Sang Jun;Jeon, Jong-June
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1245-1255
    • /
    • 2017
  • The online learning is a process of obtaining the solution for a given objective function where the data is accumulated in real time or in batch units. The stochastic gradient descent method is one of the most widely used for the online learning. This method is not only easy to implement, but also has good properties of the solution under the assumption that the generating model of data is homogeneous. However, the stochastic gradient method could severely mislead the online-learning when the homogeneity is actually violated. We assume that there are two heterogeneous generating models in the observation, and propose the a new stochastic gradient method that mitigate the problem of the heterogeneous models. We introduce a robust mini-batch optimization method using statistical tests and investigate the convergence radius of the solution in the proposed method. Moreover, the theoretical results are confirmed by the numerical simulations.

Nonlinear Speech Enhancement Method for Reducing the Amount of Speech Distortion According to Speech Statistics Model (음성 통계 모형에 따른 음성 왜곡량 감소를 위한 비선형 음성강조법)

  • Choi, Jae-Seung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.465-470
    • /
    • 2021
  • A robust speech recognition technology is required that does not degrade the performance of speech recognition and the quality of the speech when speech recognition is performed in an actual environment of the speech mixed with noise. With the development of such speech recognition technology, it is necessary to develop an application that achieves stable and high speech recognition rate even in a noisy environment similar to the human speech spectrum. Therefore, this paper proposes a speech enhancement algorithm that processes a noise suppression based on the MMSA-STSA estimation algorithm, which is a short-time spectral amplitude method based on the error of the least mean square. This algorithm is an effective nonlinear speech enhancement algorithm based on a single channel input and has high noise suppression performance. Moreover this algorithm is a technique that reduces the amount of distortion of the speech based on the statistical model of the speech. In this experiment, in order to verify the effectiveness of the MMSA-STSA estimation algorithm, the effectiveness of the proposed algorithm is verified by comparing the input speech waveform and the output speech waveform.

Applying the ANFIS to the Analysis of Rain and Dark Effects on the Saturation Headways at Signalized Intersections (강우 및 밝기에 따른 신호교차로 포화차두시간 분석에의 적응 뉴로-퍼지 적용)

  • Kim, Kyung Whan;Chung, Jae Whan;Kim, Daehyon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.4D
    • /
    • pp.573-580
    • /
    • 2006
  • The Saturation headway is a major parameter in estimating the intersection capacity and setting the signal timing. But Existing algorithms are still far from being robust in dealing with factors related to the variation of saturation headways at signalized intersections. So this study apply the fuzzy inference system using ANFIS. The ANFIS provides a method for the fuzzy modeling procedure to learn information about a data set, in order to compute the membership function parameters that best allow the associated fuzzy inference system to track the given input/output data. The climate conditions and the degree of brightness were chosen as the input variables when the rate of heavy vehicles is 10-25 %. These factors have the uncertain nature in quantification, which is the reason why these are chosen as the fuzzy variables. A neuro-fuzzy inference model to estimate saturation headways at signalized intersections was constructed in this study. Evaluating the model using the statistics of $R^2$, MAE and MSE, it was shown that the explainability of the model was very high, the values of the statistics being 0.993, 0.0289, 0.0173 respectively.

A Study of Robust Design of FCM Gasket Using Taguchi Method (다구찌 기법을 이용한 FCM 가스켓의 강건 설계에 관한 연구)

  • Chung, Jin-Eun;Ahn, Jueng-Kyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3177-3183
    • /
    • 2013
  • This paper deals with the robust design of the non-asbestos FCM(Fiber-elastomer Coated Metal) gasket. In order to this, the measurement of the shear stress based on the design of experimet using the orthogonal table was carried out and the control factors for shear stress using the larger-the-better SN ratios with the Taguchi method were evaluated. In addition, the analysis of variance for SN ratios was conducted. The temperature, pressure, duration time and humidity were selected as the control factors. The orthogonal table $L_9(3^4)$ was made of 3 levels for each factor and the measurement of shear stress was acomplished on the base of the table. Delta statistics of time is the highest value 0.93 and therefore the time affect the largest effect on the shear stress of gasket. Also from the analysis, the shear stress shows maximun at the duration time 80 sec, temperature $200^{\circ}C$, pressure 90 $kgf/cm^2$, humidity 60 %RH. P values of duration time and temperature as a results of the analysis of variance are 0.037 and 0.098. Therefore the analysis has significant each with 95% and 90% confidence level.

An improvement of MT transfer function estimates using by pre-screening scheme based on the statistical distribution of electromagnetic fields (통계적 사전 처리방법을 통한 MT 전달함수 추정의 향상 기법 연구)

  • Yang Junmo;Kwon Byung-Doo;Lee Duk-Kee;Song Youn-Ho;Youn Yong-Hoon
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 2005.05a
    • /
    • pp.273-280
    • /
    • 2005
  • Robust magneto-telluric (MT) response function estimators are now in standard use in electromagnetic induction research. Properly devised and applied, these methods can reduce the influence of unusual data (outlier) in the response (electric field) variable, but often not sensitive to exceptional predictor (magnetic field) data, which are termed leverage points. A bounded influence estimator is described which simultaneously limits the influence of both outlier and leverage point, and has proven to consistently yield more reliable MT response function estimates than conventional robust approach. The bounded influence estimator combines a standard robust M-estimator with leverage weighting based on the statistics of the hat matrix diagonal, which is a standard statistical measure of unusual predictors. Further extensions to MT data analysis are proposed, including a establishment of data rejection criterion which minimize the influence of both electric and magnetic outlier in frequency domain based on statistical distribution of electromagnetic field. The rejection scheme made in this study seems to have an effective performance on eliminating extreme data, which is even not removed by BI estimator, in frequency domain. The effectiveness and advantage of these developments are illustrated using real MT data.

  • PDF

Combining Information of Common Metabolites Reveals Global Differences between Colorectal Cancerous and Normal Tissues

  • Chae, Young-Kee;Kang, Woo-Young;Kim, Seong-Hwan;Joo, Jong-Eun;Han, Joon-Kil;Hong, Boo-Whan
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.2
    • /
    • pp.379-383
    • /
    • 2010
  • Metabolites of colorectal cancer tissues from 12 patients were analyzed and compared with those of the normal tissues by two-dimensional NMR spectroscopy. NMR data were analyzed with the help of the metabolome database and the statistics software. Cancerous tissues showed significantly altered metabolic profiles as compared to the normal tissues. Among such metabolites, the concentrations of taurine, glutamate, choline were notably increased in the cancerous tissues of most patients, and those of glucose, malate, and glycerol were decreased. Changes in individual metabolites varied significantly from patient to patient, but the combination of such changes could be used to distinguish cancerous tissues from normal ones, which could be done by PCA analysis. The traditional chemometric analysis was also performed using AMIX software. By comparing those two results, the analysis via $^1H-^{13}C$ HSQC spectra proved to be more robust and effective in assessing and classifying global metabolic profiles of the colorectal tissues.

Effects of Symptom Recognition and Health Behavior Compliance on Hospital Arrival Time in Patients with Acute Myocardial Infarction (급성심근경색증 환자의 증상 인지와 건강행위 이행이 내원시간에 미치는 영향)

  • Han, Eun Ju;Kim, Jeong Sun
    • Korean Journal of Adult Nursing
    • /
    • v.27 no.1
    • /
    • pp.83-93
    • /
    • 2015
  • Purpose: This study was to investigate the relationship among the symptom recognition, health behavior compliance, and the hospital arrival time to identify factors influencing the hospital arrival time in patient with acute myocardial infarction (AMI). Methods: The subjects of this study were 200 patients with AMI in C hospital in D city. Data were analyzed using descriptive statistics, independent t-test, One way ANOVA, Pearson's correlation coefficients, and stepwise multiple liner regression tests. Results: Level of symptom recognition and health behavior compliance was low. The median value of hospital arrival time was 4.48 hours (ST-segment Elevation Ml was 2.43 hours and Non ST-segment Elevation MI was 7.83 hours). Among the studied factors, only symptom recognition had a statistically significant positive correlation with health behavior compliance (r=0.38, p<.001). Factors influencing the hospital arrival time were MI classification, diabetes mellitus (DM) and transport vehicle to the 1st hospital, and they accounted for 13% of the variance for hospital arrival time in AMI patients. Conclusion: To prevent the delay of hospital arrival time in MI patients, a more robust nursing strategic intervention according to MI classification and DM is necessary; further education on the importance of transportation utilization is also mandated.

A comparison study of classification method based of SVM and data depth in microarray data (마이크로어레이 자료에서 서포트벡터머신과 데이터 뎁스를 이용한 분류방법의 비교연구)

  • Hwang, Jin-Soo;Kim, Jee-Yun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.311-319
    • /
    • 2009
  • A robust L1 data depth was used in clustering and classification, so called DDclus and DDclass by Jornsten (2004). SVM-based classification works well in most of the situation but show some weakness in the presence of outliers. Proper gene selection is important in classification since there are so many redundant genes. Either by selecting appropriate genes or by gene clustering combined with classification method enhance the overall performance of classification. The performance of depth based method are evaluated among several SVM-based classification methods.

  • PDF

A Procedure for Indentifying Outliers in Multivariate Data (다변량 자료에서 다수 이상치 인식의 절차)

  • Yum, Joon-Keun;Park, Jong-Goo;Kim, Jong-Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.23 no.4
    • /
    • pp.28-41
    • /
    • 1995
  • We consider the problem of identifying multiple outliers in linear model. The available regression diagnostic methods often do not succeed in detecting multiple outliers because of the masking and swamping effect. Recently, among the various robust estimator of reducing the effect of outliers, LMS(Least Meadian Square) estimator has been to be a suitable method proposed to expose outliers and leverage points. However, as you know it, the data analysis method with LMS estimator is to be taken the median of the squared residuals in the sample which is extracted the sample space. Then this model causes the trouble, for the number of the chosen sample is nCp, i.e. as the size of sample space n is increasing, the number is increasing fastly. And the covariance matrix may be the singular matrix, so that matrix is approching collinearity. Thus we propose a procedure ELMS for the resampling in LMS method and study the size of the effective elementary set in this algorithm.

  • PDF