• Title/Summary/Keyword: lasso

Search Result 173, Processing Time 0.03 seconds

Value at Risk calculation using sparse vine copula models (성근 바인 코풀라 모형을 이용한 고차원 금융 자료의 VaR 추정)

  • An, Kwangjoon;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.875-887
    • /
    • 2021
  • Value at Risk (VaR) is the most popular measure for market risk. In this paper, we consider the VaR estimation of portfolio consisting of a variety of assets based on multivariate copula model known as vine copula. In particular, sparse vine copula which penalizes too many parameters is considered. We show in the simulation study that sparsity indeed improves out-of-sample forecasting of VaR. Empirical analysis on 60 KOSPI stocks during the last 5 years also demonstrates that sparse vine copula outperforms regular copula model.

HisCoM-mimi: software for hierarchical structural component analysis for miRNA-mRNA integration model for binary phenotypes

  • Kim, Yongkang;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.1
    • /
    • pp.10.1-10.3
    • /
    • 2019
  • To identify miRNA-mRNA interaction pairs associated with binary phenotypes, we propose a hierarchical structural component model for miRNA-mRNA integration (HisCoM-mimi). Information on known mRNA targets provided by TargetScan is used to perform HisCoM-mimi. However, multiple databases can be used to find miRNA-mRNA signatures with known biological information through different algorithms. To take these additional databases into account, we present our advanced application software for HisCoM-mimi for binary phenotypes. The proposed HisCoM-mimi supports both TargetScan and miRTarBase, which provides manually-verified information initially gathered by text-mining the literature. By integrating information from miRTarBase into HisCoM-mimi, a broad range of target information derived from the research literature can be analyzed. Another improvement of the new HisCoM-mimi approach is the inclusion of updated algorithms to provide the lasso and elastic-net penalties for users who want to fit a model with a smaller number of selected miRNAs and mRNAs. We expect that our HisCoM-mimi software will make advanced methods accessible to researchers who want to identify miRNA-mRNA interaction pairs related with binary phenotypes.

Application of Regularized Linear Regression Models Using Public Domain data for Cycle Life Prediction of Commercial Lithium-Ion Batteries (상업용 리튬 배터리의 수명 예측을 위한 고속대량충방전 데이터 정규화 선형회귀모델의 적용)

  • KIM, JANG-GOON;LEE, JONG-SOOK
    • Journal of Hydrogen and New Energy
    • /
    • v.32 no.6
    • /
    • pp.592-611
    • /
    • 2021
  • In this study a rarely available high-throughput cycling data set of 124 commercial lithium iron phosphate/graphite cells cycled under fast-charging conditions, with widely varying cycle lives ranging from 150 to 2,300 cycles including in-cycle temperature and per-cycle IR measurements. We worked out own Python codes which reproduced the various data plots and machine learning approaches for cycle life prediction using early cycles and more details not presented in the article and the supplementary information. Particularly, we applied regularized ridge, lasso and elastic net linear regression models using features extracted from capacity fade curves, discharge voltage curves, and other data such as internal resistance and cell can temperature. We found that due to the limitation in the quantity and quality of the data from costly and lengthy battery testing a careful hyperparameter tuning may be required and that model features need to be extracted based on the domain knowledge.

Study on the Anthropometric and Body Composition Indices for Prediction of Cold and Heat Pattern

  • Mun, Sujeong;Park, Kihyun;Lee, Siwoo
    • The Journal of Korean Medicine
    • /
    • v.42 no.4
    • /
    • pp.185-196
    • /
    • 2021
  • Objectives: Many symptoms of cold and heat patterns are related to the thermoregulation of the body. Thus, we aimed to study the association of cold and heat patterns with anthropometry/body composition. Methods: The cold and heat patterns of 2000 individuals aged 30-55 years were evaluated using a self-administered questionnaire. Results: Among the anthropometric and body composition variables, body mass index (-0.37, 0.39) and fat mass index (-0.35, 0.38) had the highest correlation coefficients with the cold and heat pattern scores after adjustment for age and sex in the cold-heat group, while the correlation coefficients were relatively lower in the non-cold-heat group. In the cold-heat group, the most parsimonious model for the cold pattern with the variables selected by the best subset method and Lasso included sex, body mass index, waist-hip ratio, and extracellular water/total body water (adjusted R2 = 0.324), and the model for heat pattern additionally included age (adjusted R2 = 0.292). Conclusions: The variables related to obesity and water balance were the most useful for predicting cold and heat patterns. Further studies are required to improve the performance of prediction models.

Research on the Lesion Classification by Radiomics in Laryngoscopy Image (후두내시경 영상에서의 라디오믹스에 의한 병변 분류 연구)

  • Park, Jun Ha;Kim, Young Jae;Woo, Joo Hyun;Kim, Kwang Gi
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.353-360
    • /
    • 2022
  • Laryngeal disease harms quality of life, and laryngoscopy is critical in identifying causative lesions. This study extracts and analyzes using radiomics quantitative features from the lesion in laryngoscopy images and will fit and validate a classifier for finding meaningful features. Searching the region of interest for lesions not classified by the YOLOv5 model, features are extracted with radionics. Selected the extracted features are through a combination of three feature selectors, and three estimator models. Through the selected features, trained and verified two classification models, Random Forest and Gradient Boosting, and found meaningful features. The combination of SFS, LASSO, and RF shows the highest performance with an accuracy of 0.90 and AUROC 0.96. Model using features to select by SFM, or RIDGE was low lower performance than other things. Classification of larynx lesions through radiomics looks effective. But it should use various feature selection methods and minimize data loss as losing color data.

Comparison of covariance thresholding methods in gene set analysis

  • Park, Sora;Kim, Kipoong;Sun, Hokeun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.591-601
    • /
    • 2022
  • In gene set analysis with microarray expression data, a group of genes such as a gene regulatory pathway and a signaling pathway is often tested if there exists either differentially expressed (DE) or differentially co-expressed (DC) genes between two biological conditions. Recently, a statistical test based on covariance estimation have been proposed in order to identify DC genes. In particular, covariance regularization by hard thresholding indeed improved the power of the test when the proportion of DC genes within a biological pathway is relatively small. In this article, we compare covariance thresholding methods using four different regularization penalties such as lasso, hard, smoothly clipped absolute deviation (SCAD), and minimax concave plus (MCP) penalties. In our extensive simulation studies, we found that both SCAD and MCP thresholding methods can outperform the hard thresholding method when the proportion of DC genes is extremely small and the number of genes in a biological pathway is much greater than a sample size. We also applied four thresholding methods to 3 different microarray gene expression data sets related with mutant p53 transcriptional activity, and epithelium and stroma breast cancer to compare genetic pathways identified by each method.

A Genome-Wide Analysis of Antibiotic Producing Genes in Streptomyces globisporus SP6C4

  • Kim, Da-Ran;Kwak, Youn-Sig
    • The Plant Pathology Journal
    • /
    • v.37 no.4
    • /
    • pp.389-395
    • /
    • 2021
  • Soil is the major source of plant-associated microbes. Several fungal and bacterial species live within plant tissues. Actinomycetes are well known for producing a variety of antibiotics, and they contribute to improving plant health. In our previous report, Streptomyces globisporus SP6C4 colonized plant tissues and was able to move to other tissues from the initially colonized ones. This strain has excellent antifungal and antibacterial activities and provides a suppressive effect upon various plant diseases. Here, we report the genome-wide analysis of antibiotic producing genes in S. globisporus SP6C4. A total of 15 secondary metabolite biosynthetic gene clusters were predicted using antiSMASH. We used the CRISPR/Cas9 mutagenesis system, and each biosynthetic gene was predicted via protein basic local alignment search tool (BLAST) and rapid annotation using subsystems technology (RAST) server. Three gene clusters were shown to exhibit antifungal or antibacterial activity, viz. cluster 16 (lasso peptide), cluster 17 (thiopeptide-lantipeptide), and cluster 20 (lantipeptide). The results of the current study showed that SP6C4 has a variety of antimicrobial activities, and this strain is beneficial in agriculture.

Modelling the deflection of reinforced concrete beams using the improved artificial neural network by imperialist competitive optimization

  • Li, Ning;Asteris, Panagiotis G.;Tran, Trung-Tin;Pradhan, Biswajeet;Nguyen, Hoang
    • Steel and Composite Structures
    • /
    • v.42 no.6
    • /
    • pp.733-745
    • /
    • 2022
  • This study proposed a robust artificial intelligence (AI) model based on the social behaviour of the imperialist competitive algorithm (ICA) and artificial neural network (ANN) for modelling the deflection of reinforced concrete beams, abbreviated as ICA-ANN model. Accordingly, the ICA was used to adjust and optimize the parameters of an ANN model (i.e., weights and biases) aiming to improve the accuracy of the ANN model in modelling the deflection reinforced concrete beams. A total of 120 experimental datasets of reinforced concrete beams were employed for this aim. Therein, applied load, tensile reinforcement strength and the reinforcement percentage were used to simulate the deflection of reinforced concrete beams. Besides, five other AI models, such as ANN, SVM (support vector machine), GLMNET (lasso and elastic-net regularized generalized linear models), CART (classification and regression tree) and KNN (k-nearest neighbours), were also used for the comprehensive assessment of the proposed model (i.e., ICA-ANN). The comparison of the derived results with the experimental findings demonstrates that among the developed models the ICA-ANN model is that can approximate the reinforced concrete beams deflection in a more reliable and robust manner.

A Pre-processing Process Using TadGAN-based Time-series Anomaly Detection (TadGAN 기반 시계열 이상 탐지를 활용한 전처리 프로세스 연구)

  • Lee, Seung Hoon;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.3
    • /
    • pp.459-471
    • /
    • 2022
  • Purpose: The purpose of this study was to increase prediction accuracy for an anomaly interval identified using an artificial intelligence-based time series anomaly detection technique by establishing a pre-processing process. Methods: Significant variables were extracted by applying feature selection techniques, and anomalies were derived using the TadGAN time series anomaly detection algorithm. After applying machine learning and deep learning methodologies using normal section data (excluding anomaly sections), the explanatory power of the anomaly sections was demonstrated through performance comparison. Results: The results of the machine learning methodology, the performance was the best when SHAP and TadGAN were applied, and the results in the deep learning, the performance was excellent when Chi-square Test and TadGAN were applied. Comparing each performance with the papers applied with a Conventional methodology using the same data, it can be seen that the performance of the MLR was significantly improved to 15%, Random Forest to 24%, XGBoost to 30%, Lasso Regression to 73%, LSTM to 17% and GRU to 19%. Conclusion: Based on the proposed process, when detecting unsupervised learning anomalies of data that are not actually labeled in various fields such as cyber security, financial sector, behavior pattern field, SNS. It is expected to prove the accuracy and explanation of the anomaly detection section and improve the performance of the model.

A Study on the Application of Measurement Data Using Machine Learning Regression Models

  • Yun-Seok Seo;Young-Gon Kim
    • International journal of advanced smart convergence
    • /
    • v.12 no.2
    • /
    • pp.47-55
    • /
    • 2023
  • The automotive industry is undergoing a paradigm shift due to the convergence of IT and rapid digital transformation. Various components, including embedded structures and systems with complex architectures that incorporate IC semiconductors, are being integrated and modularized. As a result, there has been a significant increase in vehicle defects, raising expectations for the quality of automotive parts. As more and more data is being accumulated, there is an active effort to go beyond traditional reliability analysis methods and apply machine learning models based on the accumulated big data. However, there are still not many cases where machine learning is used in product development to identify factors of defects in performance and durability of products and incorporate feedback into the design to improve product quality. In this paper, we applied a prediction algorithm to the defects of automotive door devices equipped with automatic responsive sensors, which are commonly installed in recent electric and hydrogen vehicles. To do so, we selected test items, built a measurement emulation system for data acquisition, and conducted comparative evaluations by applying different machine learning algorithms to the measured data. The results in terms of R2 score were as follows: Ordinary multiple regression 0.96, Ridge regression 0.95, Lasso regression 0.89, Elastic regression 0.91.