• Title/Summary/Keyword: Sample Vector

Search Result 270, Processing Time 0.022 seconds

Adaptive lasso in sparse vector autoregressive models (Adaptive lasso를 이용한 희박벡터자기회귀모형에서의 변수 선택)

  • Lee, Sl Gi;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.27-39
    • /
    • 2016
  • This paper considers variable selection in the sparse vector autoregressive (sVAR) model where sparsity comes from setting small coefficients to exact zeros. In the estimation perspective, Davis et al. (2015) showed that the lasso type of regularization method is successful because it provides a simultaneous variable selection and parameter estimation even for time series data. However, their simulations study reports that the regular lasso overestimates the number of non-zero coefficients, hence its finite sample performance needs improvements. In this article, we show that the adaptive lasso significantly improves the performance where the adaptive lasso finds the sparsity patterns superior to the regular lasso. Some tuning parameter selections in the adaptive lasso are also discussed from the simulations study.

Credit Risk Evaluations of Online Retail Enterprises Using Support Vector Machines Ensemble: An Empirical Study from China

  • LI, Xin;XIA, Han
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.9 no.8
    • /
    • pp.89-97
    • /
    • 2022
  • The e-commerce market faces significant credit risks due to the complexity of the industry and information asymmetries. Therefore, credit risk has started to stymie the growth of e-commerce. However, there is no reliable system for evaluating the creditworthiness of e-commerce companies. Therefore, this paper constructs a credit risk evaluation index system that comprehensively considers the online and offline behavior of online retail enterprises, including 15 indicators that reflect online credit risk and 15 indicators that reflect offline credit risk. This paper establishes an integration method based on a fuzzy integral support vector machine, which takes the factor analysis results of the credit risk evaluation index system of online retail enterprises as the input and the credit risk evaluation results of online retail enterprises as the output. The classification results of each sub-classifier and the importance of each sub-classifier decision to the final decision have been taken into account in this method. Select the sample data of 1500 online retail loan customers from a bank to test the model. The empirical results demonstrate that the proposed method outperforms a single SVM and traditional SVMs aggregation technique via majority voting in terms of classification accuracy, which provides a basis for banks to establish a reliable evaluation system.

Machine learning-based Predictive Model of Suicidal Thoughts among Korean Adolescents. (머신러닝 기반 한국 청소년의 자살 생각 예측 모델)

  • YeaJu JIN;HyunKi KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.1-6
    • /
    • 2023
  • This study developed models using decision forest, support vector machine, and logistic regression methods to predict and prevent suicidal ideation among Korean adolescents. The study sample consisted of 51,407 individuals after removing missing data from the raw data of the 18th (2022) Youth Health Behavior Survey conducted by the Korea Centers for Disease Control and Prevention. Analysis was performed using the MS Azure program with Two-Class Decision Forest, Two-Class Support Vector Machine, and Two-Class Logistic Regression. The results of the study showed that the decision forest model achieved an accuracy of 84.8% and an F1-score of 36.7%. The support vector machine model achieved an accuracy of 86.3% and an F1-score of 24.5%. The logistic regression model achieved an accuracy of 87.2% and an F1-score of 40.1%. Applying the logistic regression model with SMOTE to address data imbalance resulted in an accuracy of 81.7% and an F1-score of 57.7%. Although the accuracy slightly decreased, the recall, precision, and F1-score improved, demonstrating excellent performance. These findings have significant implications for the development of prediction models for suicidal ideation among Korean adolescents and can contribute to the prevention and improvement of youth suicide.

Tailoring Magnetic Interlayer Coupling Contribution via Lateral Confinement (가로 가둠을 통한 자성층간 결합 기여도 조절)

  • Lee, Dong Ryeol
    • Journal of the Korean Magnetics Society
    • /
    • v.26 no.5
    • /
    • pp.149-153
    • /
    • 2016
  • In Fe/Gd multilayers, patterning effect on the interlayer coupling was studied by comparing patterned and unpatterned samples that were cut from a multilayer film. A comparative study of the two samples via temperature dependent Gd-specific magnetization vector using X-ray magnetic circular dichroism (XMCD) shows that the temperature dependence of the Gd magnetization vector can be modified in the patterned sample due to a competition between the patterning and antiferromagnetic interlayer coupling effects.

A note on nonparametric density deconvolution by weighted kernel estimators

  • Lee, Sungho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.951-959
    • /
    • 2014
  • Recently Hazelton and Turlach (2009) proposed a weighted kernel density estimator for the deconvolution problem. In the case of Gaussian kernels and measurement error, they argued that the weighted kernel density estimator is a competitive estimator over the classical deconvolution kernel estimator. In this paper we consider weighted kernel density estimators when sample observations are contaminated by double exponentially distributed errors. The performance of the weighted kernel density estimators is compared over the classical deconvolution kernel estimator and the kernel density estimator based on the support vector regression method by means of a simulation study. The weighted density estimator with the Gaussian kernel shows numerical instability in practical implementation of optimization function. However the weighted density estimates with the double exponential kernel has very similar patterns to the classical kernel density estimates in the simulations, but the shape is less satisfactory than the classical kernel density estimator with the Gaussian kernel.

COMPARISON STUDY OF BIVARIATE LAPLACE DISTRIBUTIONS WITH THE SAME MARGINAL DISTRIBUTION

  • Hong, Chong-Sun;Hong, Sung-Sick
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.107-128
    • /
    • 2004
  • Bivariate Laplace distributions for which both marginal distributions and Laplace are discussed. Three kinds of bivariate Laplace distributions which are extended bivariate exponential distributions of Gumbel (1960) are introduced in this paper. These symmetrical distributions are compared with asymmetrical distributions of Kotz et al. (2000). Their probability density functions, cumulative distribution functions are derived. Conditional skewnesses and kurtoses are also defined. Their correlation coefficients are calculated and compared with others. We proposed bivariate random vector generating methods whose distributions are bivariate Laplace. With sample means and medians obtained from generated random vectors, variance and covariance matrices of means and medians are calculated and discussed with those of bivariate normal distribution.

A Prediction Model Based on Relevance Vector Machine and Granularity Analysis

  • Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.3
    • /
    • pp.157-162
    • /
    • 2016
  • In this paper, a yield prediction model based on relevance vector machine (RVM) and a granular computing model (quotient space theory) is presented. With a granular computing model, massive and complex meteorological data can be analyzed at different layers of different grain sizes, and new meteorological feature data sets can be formed in this way. In order to forecast the crop yield, a grey model is introduced to label the training sample data sets, which also can be used for computing the tendency yield. An RVM algorithm is introduced as the classification model for meteorological data mining. Experiments on data sets from the real world using this model show an advantage in terms of yield prediction compared with other models.

On the Dependence Structure of Concornitants of Order Statistics

  • Song-Ho Kim;Tae-Sung Kim
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.2
    • /
    • pp.255-263
    • /
    • 1996
  • Let $(X_{1j}, X_{2j}, … , X_{nj}, Y_j/)$j = 1, 2, … , n, be a sample of size n on an (m + l)-dimensional vector $(X_1, X_2, … , X_m, Y)$, m .geq. 1. If $Y_{(r)}$ denote the rth order statistic from Y, then the $X_{[r:n]}$ paired with $Y_(r)$ is termed the concomitant vector of the order statistics. The general distributions of concomitant of order statistics will be found. The mean, variance and covariance of$X_{[r:n]}$ Will be studied. Then we will apply the results to the multivariate normal variate case.e.

  • PDF

Comparing Machine Learning Classifiers for Movie WOM Opinion Mining

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3169-3181
    • /
    • 2015
  • Nowadays, online word-of-mouth has become a powerful influencer to marketing and sales in business. Opinion mining and sentiment analysis is frequently adopted at market research and business analytics field for analyzing word-of-mouth content. However, there still remain several challengeable areas for 1) sentiment analysis aiming for Korean word-of-mouth content in film market, 2) availability of machine learning models only using linguistic features, 3) effect of the size of the feature set. This study took a sample of 10,000 movie reviews which had posted extremely negative/positive rating in a movie portal site, and conducted sentiment analysis with four machine learning algorithms: naïve Bayesian, decision tree, neural network, and support vector machines. We found neural network and support vector machine produced better accuracy than naïve Bayesian and decision tree on every size of the feature set. Besides, the performance of them was boosting with increasing of the feature set size.

Cross platform classification of microarrays by rank comparison

  • Lee, Sunho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.475-486
    • /
    • 2015
  • Mining the microarray data accumulated in the public data repositories can save experimental cost and time and provide valuable biomedical information. Big data analysis pooling multiple data sets increases statistical power, improves the reliability of the results, and reduces the specific bias of the individual study. However, integrating several data sets from different studies is needed to deal with many problems. In this study, I limited the focus to the cross platform classification that the platform of a testing sample is different from the platform of a training set, and suggested a simple classification method based on rank. This method is compared with the diagonal linear discriminant analysis, k nearest neighbor method and support vector machine using the cross platform real example data sets of two cancers.