• Title/Summary/Keyword: lasso

Search Result 169, Processing Time 0.02 seconds

Prediction of the Following BCI Performance by Means of Spectral EEG Characteristics in the Prior Resting State (뇌신호 주파수 특성을 이용한 CNN 기반 BCI 성능 예측)

  • Kang, Jae-Hwan;Kim, Sung-Hee;Youn, Joosang;Kim, Junsuk
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.11
    • /
    • pp.265-272
    • /
    • 2020
  • In the research of brain computer interface (BCI) technology, one of the big problems encountered is how to deal with some people as called the BCI-illiteracy group who could not control the BCI system. To approach this problem efficiently, we investigated a kind of spectral EEG characteristics in the prior resting state in association with BCI performance in the following BCI tasks. First, spectral powers of EEG signals in the resting state with both eyes-open and eyes-closed conditions were respectively extracted. Second, a convolution neural network (CNN) based binary classifier discriminated the binary motor imagery intention in the BCI task. Both the linear correlation and binary prediction methods confirmed that the spectral EEG characteristics in the prior resting state were highly related to the BCI performance in the following BCI task. Linear regression analysis demonstrated that the relative ratio of the 13 Hz below and above the spectral power in the resting state with only eyes-open, not eyes-closed condition, were significantly correlated with the quantified metrics of the BCI performance (r=0.544). A binary classifier based on the linear regression with L1 regularization method was able to discriminate the high-performance group and low-performance group in the following BCI task by using the spectral-based EEG features in the precedent resting state (AUC=0.817). These results strongly support that the spectral EEG characteristics in the frontal regions during the resting state with eyes-open condition should be used as a good predictor of the following BCI task performance.

A study on entertainment TV show ratings and the number of episodes prediction (국내 예능 시청률과 회차 예측 및 영향요인 분석)

  • Kim, Milim;Lim, Soyeon;Jang, Chohee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.809-825
    • /
    • 2017
  • The number of TV entertainment shows is increasing. Competition among programs in the entertainment market is intensifying since cable channels air many entertainment TV shows. There is now a need for research on program ratings and the number of episodes. This study presents predictive models for entertainment TV show ratings and number of episodes. We use various data mining techniques such as linear regression, logistic regression, LASSO, random forests, gradient boosting, and support vector machine. The analysis results show that the average program ratings before the first broadcast is affected by broadcasting company, average ratings of the previous season, starting year and number of articles. The average program ratings after the first broadcast is influenced by the rating of the first broadcast, broadcasting company and program type. We also found that the predicted average ratings, starting year, type and broadcasting company are important variables in predicting of the number of episodes.

Prediction of Greenhouse Strawberry Production Using Machine Learning Algorithm (머신러닝 알고리즘을 이용한 온실 딸기 생산량 예측)

  • Kim, Na-eun;Han, Hee-sun;Arulmozhi, Elanchezhian;Moon, Byeong-eun;Choi, Yung-Woo;Kim, Hyeon-tae
    • Journal of Bio-Environment Control
    • /
    • v.31 no.1
    • /
    • pp.1-7
    • /
    • 2022
  • Strawberry is a stand-out cultivating fruit in Korea. The optimum production of strawberry is highly dependent on growing environment. Smart farm technology, and automatic monitoring and control system maintain a favorable environment for strawberry growth in greenhouses, as well as play an important role to improve production. Moreover, physiological parameters of strawberry plant and it is surrounding environment may allow to give an idea on production of strawberry. Therefore, this study intends to build a machine learning model to predict strawberry's yield, cultivated in greenhouse. The environmental parameter like as temperature, humidity and CO2 and physiological parameters such as length of leaves, number of flowers and fruits and chlorophyll content of 'Seolhyang' (widely growing strawberry cultivar in Korea) were collected from three strawberry greenhouses located in Sacheon of Gyeongsangnam-do during the period of 2019-2020. A predictive model, Lasso regression was designed and validated through 5-fold cross-validation. The current study found that performance of the Lasso regression model is good to predict the number of flowers and fruits, when the MAPE value are 0.511 and 0.488, respectively during the model validation. Overall, the present study demonstrates that using AI based regression model may be convenient for farms and agricultural companies to predict yield of crops with fewer input attributes.

Cox Model Improvement Using Residual Blocks in Neural Networks: A Study on the Predictive Model of Cervical Cancer Mortality (신경망 내 잔여 블록을 활용한 콕스 모델 개선: 자궁경부암 사망률 예측모형 연구)

  • Nang Kyeong Lee;Joo Young Kim;Ji Soo Tak;Hyeong Rok Lee;Hyun Ji Jeon;Jee Myung Yang;Seung Won Lee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.260-268
    • /
    • 2024
  • Cervical cancer is the fourth most common cancer in women worldwide, and more than 604,000 new cases were reported in 2020 alone, resulting in approximately 341,831 deaths. The Cox regression model is a major model widely adopted in cancer research, but considering the existence of nonlinear associations, it faces limitations due to linear assumptions. To address this problem, this paper proposes ResSurvNet, a new model that improves the accuracy of cervical cancer mortality prediction using ResNet's residual learning framework. This model showed accuracy that outperforms the DNN, CPH, CoxLasso, Cox Gradient Boost, and RSF models compared in this study. As this model showed accuracy that outperformed the DNN, CPH, CoxLasso, Cox Gradient Boost, and RSF models compared in this study, this excellent predictive performance demonstrates great value in early diagnosis and treatment strategy establishment in the management of cervical cancer patients and represents significant progress in the field of survival analysis.

Performance of Prediction Models for Diagnosing Severe Aortic Stenosis Based on Aortic Valve Calcium on Cardiac Computed Tomography: Incorporation of Radiomics and Machine Learning

  • Nam gyu Kang;Young Joo Suh;Kyunghwa Han;Young Jin Kim;Byoung Wook Choi
    • Korean Journal of Radiology
    • /
    • v.22 no.3
    • /
    • pp.334-343
    • /
    • 2021
  • Objective: We aimed to develop a prediction model for diagnosing severe aortic stenosis (AS) using computed tomography (CT) radiomics features of aortic valve calcium (AVC) and machine learning (ML) algorithms. Materials and Methods: We retrospectively enrolled 408 patients who underwent cardiac CT between March 2010 and August 2017 and had echocardiographic examinations (240 patients with severe AS on echocardiography [the severe AS group] and 168 patients without severe AS [the non-severe AS group]). Data were divided into a training set (312 patients) and a validation set (96 patients). Using non-contrast-enhanced cardiac CT scans, AVC was segmented, and 128 radiomics features for AVC were extracted. After feature selection was performed with three ML algorithms (least absolute shrinkage and selection operator [LASSO], random forests [RFs], and eXtreme Gradient Boosting [XGBoost]), model classifiers for diagnosing severe AS on echocardiography were developed in combination with three different model classifier methods (logistic regression, RF, and XGBoost). The performance (c-index) of each radiomics prediction model was compared with predictions based on AVC volume and score. Results: The radiomics scores derived from LASSO were significantly different between the severe AS and non-severe AS groups in the validation set (median, 1.563 vs. 0.197, respectively, p < 0.001). A radiomics prediction model based on feature selection by LASSO + model classifier by XGBoost showed the highest c-index of 0.921 (95% confidence interval [CI], 0.869-0.973) in the validation set. Compared to prediction models based on AVC volume and score (c-indexes of 0.894 [95% CI, 0.815-0.948] and 0.899 [95% CI, 0.820-0.951], respectively), eight and three of the nine radiomics prediction models showed higher discrimination abilities for severe AS. However, the differences were not statistically significant (p > 0.05 for all). Conclusion: Models based on the radiomics features of AVC and ML algorithms may perform well for diagnosing severe AS, but the added value compared to AVC volume and score should be investigated further.

Estimation for misclassified data with ultra-high levels

  • Kang, Moonsu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.217-223
    • /
    • 2016
  • Outcome misclassification is widespread in classification problems, but methods to account for it are rarely used. In this paper, the problem of inference with misclassified multinomial logit data with a large number of multinomial parameters is addressed. We have had a significant swell of interest in the development of novel methods to infer misclassified data. One simulation study is shown regarding how seriously misclassification issue occurs if the number of categories increase. Then, using the group lasso regression, we will show how the best model should be fitted for that kind of multinomial regression problems comprehensively.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

A Study for the Drivers of Movie Box-office Performance (영화흥행 영향요인 선택에 관한 연구)

  • Kim, Yon Hyong;Hong, Jeong Han
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.441-452
    • /
    • 2013
  • This study analyzed the relationship between key film and a box office record success factors based on movies released in the first quarter of 2013 in Korea. An over-fitting problem can happen if there are too many explanatory variables inserted to regression model; in addition, there is a risk that the estimator is instable when there is multi-collinearity among the explanatory variables. For this reason, optimal variable selection based on high explanatory variables in box-office performance is of importance. Among the numerous ways to select variables, LASSO estimation applied by a generalized linear model has the smallest prediction error that can efficiently and quickly find variables with the highest explanatory power to box-office performance in order.

On sampling algorithms for imbalanced binary data: performance comparison and some caveats (불균형적인 이항 자료 분석을 위한 샘플링 알고리즘들: 성능비교 및 주의점)

  • Kim, HanYong;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.681-690
    • /
    • 2017
  • Various imbalanced binary classification problems exist such as fraud detection in banking operations, detecting spam mail and predicting defective products. Several sampling methods such as over sampling, under sampling, SMOTE have been developed to overcome the poor prediction performance of binary classifiers when the proportion of one group is dominant. In order to overcome this problem, several sampling methods such as over-sampling, under-sampling, SMOTE have been developed. In this study, we investigate prediction performance of logistic regression, Lasso, random forest, boosting and support vector machine in combination with the sampling methods for binary imbalanced data. Four real data sets are analyzed to see if there is a substantial improvement in prediction performance. We also emphasize some precautions when the sampling methods are implemented.

Weighted L1-Norm Support Vector Machine for the Classification of Highly Imbalanced Data (불균형 자료의 분류분석을 위한 가중 L1-norm SVM)

  • Kim, Eunkyung;Jhun, Myoungshic;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.9-21
    • /
    • 2015
  • The support vector machine has been successfully applied to various classification areas due to its flexibility and a high level of classification accuracy. However, when analyzing imbalanced data with uneven class sizes, the classification accuracy of SVM may drop significantly in predicting minority class because the SVM classifiers are undesirably biased toward the majority class. The weighted $L_2$-norm SVM was developed for the analysis of imbalanced data; however, it cannot identify irrelevant input variables due to the characteristics of the ridge penalty. Therefore, we propose the weighted $L_1$-norm SVM, which uses lasso penalty to select important input variables and weights to differentiate the misclassification of data points between classes. We demonstrate the satisfactory performance of the proposed method through simulation studies and a real data analysis.