International Journal of Computer Science & Network Security
/
v.21
no.12spc
/
pp.526-538
/
2021
Machine and deep learning-based models are emerging techniques that are being used to address prediction problems in biomedical data analysis. DNA sequence prediction is a critical problem that has attracted a great deal of attention in the biomedical domain. Machine and deep learning-based models have been shown to provide more accurate results when compared to conventional regression-based models. The prediction of the gene sequence that leads to cancerous diseases, such as prostate cancer, is crucial. Identifying the most important features in a gene sequence is a challenging task. Extracting the components of the gene sequence that can provide an insight into the types of mutation in the gene is of great importance as it will lead to effective drug design and the promotion of the new concept of personalised medicine. In this work, we extracted the exons in the prostate gene sequences that were used in the experiment. We built a Deep Neural Network (DNN) and Bi-directional Long-Short Term Memory (Bi-LSTM) model using a k-mer encoding for the DNA sequence and one-hot encoding for the class label. The models were evaluated using different classification metrics. Our experimental results show that DNN model prediction offers a training accuracy of 99 percent and validation accuracy of 96 percent. The bi-LSTM model also has a training accuracy of 95 percent and validation accuracy of 91 percent.
Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.
Journal of the Korea Institute of Information Security & Cryptology
/
v.32
no.2
/
pp.267-277
/
2022
Due to the increasing proportion of cloud and remote working environments, various information security incidents are occurring. Insider threats have emerged as a major issue, with cases in which corporate insiders attempting to leak confidential data by accessing it remotely. In response, insider threat detection approaches based on machine learning have been developed. However, existing machine learning methods used to detect insider threats do not take biases and variances into account, which leads to limited performance. In this paper, boosting-type ensemble learning algorithms are applied to verify the performance of malicious insider detection, conduct a close analysis, and even consider the imbalance in datasets to determine the final result. Through experiments, we show that using ensemble learning achieves similar or higher accuracy to other existing malicious insider detection approaches while considering bias-variance tradeoff. The experimental results show that ensemble learning using bagging and boosting methods reached an accuracy of over 98%, which improves malicious insider detection performance by 5.62% compared to the average accuracy of single learning models used.
Satyam Tiwari;Sarat K. Das;Madhumita Mohanty;Prakhar
Geomechanics and Engineering
/
v.37
no.5
/
pp.475-498
/
2024
The prediction of the susceptibility of soil to liquefaction using a limited set of parameters, particularly when dealing with highly unbalanced databases is a challenging problem. The current study focuses on different ensemble learning classification algorithms using highly unbalanced databases of results from in-situ tests; standard penetration test (SPT), shear wave velocity (Vs) test, and cone penetration test (CPT). The input parameters for these datasets consist of earthquake intensity parameters, strong ground motion parameters, and in-situ soil testing parameters. liquefaction index serving as the binary output parameter. After a rigorous comparison with existing literature, extreme gradient boosting (XGBoost), bagging, and random forest (RF) emerge as the most efficient models for liquefaction instance classification across different datasets. Notably, for SPT and Vs-based models, XGBoost exhibits superior performance, followed by Light gradient boosting machine (LightGBM) and Bagging, while for CPT-based models, Bagging ranks highest, followed by Gradient boosting and random forest, with CPT-based models demonstrating lower Gmean(error), rendering them preferable for soil liquefaction susceptibility prediction. Key parameters influencing model performance include internal friction angle of soil (ϕ) and percentage of fines less than 75 µ (F75) for SPT and Vs data and normalized average cone tip resistance (qc) and peak horizontal ground acceleration (amax) for CPT data. It was also observed that the addition of Vs measurement to SPT data increased the efficiency of the prediction in comparison to only SPT data. Furthermore, to enhance usability, a graphical user interface (GUI) for seamless classification operations based on provided input parameters was proposed.
Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.
Journal of the Computational Structural Engineering Institute of Korea
/
v.36
no.1
/
pp.9-18
/
2023
Predicting the compressive strength of high-performance concrete (HPC) is challenging because of the use of additional cementitious materials; thus, the development of improved predictive models is essential. The purpose of this study was to develop an HPC compressive-strength prediction model using an ensemble machine-learning method of combined bagging and stacking techniques. The result is a new ensemble technique that integrates the existing ensemble methods of bagging and stacking to solve the problems of a single machine-learning model and improve the prediction performance of the model. The nonlinear regression, support vector machine, artificial neural network, and Gaussian process regression approaches were used as single machine-learning methods and bagging and stacking techniques as ensemble machine-learning methods. As a result, the model of the proposed method showed improved accuracy results compared with single machine-learning models, an individual bagging technique model, and a stacking technique model. This was confirmed through a comparison of four representative performance indicators, verifying the effectiveness of the method.
In recent years, graph neural networks (GNNs) have been extensively used to analyze graph data across various domains because of their powerful capabilities in learning complex graph-structured data. However, recent research has focused on improving the performance of a single GNN with only two or three layers. This is because stacking layers deeply causes the over-smoothing problem of GNNs, which degrades the performance of GNNs significantly. On the other hand, ensemble methods combine individual weak models to obtain better generalization performance. Among them, gradient boosting is a powerful supervised learning algorithm that adds new weak models in the direction of reducing the errors of the previously created weak models. After repeating this process, gradient boosting combines the weak models to produce a strong model with better performance. Until now, most studies on GNNs have focused on improving the performance of a single GNN. In contrast, improving the performance of GNNs using multiple GNNs has not been studied much yet. In this paper, we propose gradient boosted graph neural networks (GBGNN) that combine multiple shallow GNNs with gradient boosting. We use shallow GNNs as weak models and create new weak models using the proposed gradient boosting-based loss function. Our empirical evaluations on three real-world datasets demonstrate that GBGNN performs much better than a single GNN. Specifically, in our experiments using graph convolutional network (GCN) and graph attention network (GAT) as weak models on the Cora dataset, GBGNN achieves performance improvements of 12.3%p and 6.1%p in node classification accuracy compared to a single GCN and a single GAT, respectively.
Journal of the Korea Society of Computer and Information
/
v.26
no.7
/
pp.37-44
/
2021
Automatic classification of brain MRI images play an important role in early diagnosis of brain tumors. In this work, we present a deep learning-based brain tumor classification model in MRI images using ensemble of deep features. In our proposed framework, three different deep features from brain MR image are extracted using three different pre-trained models. After that, the extracted deep features are fed to the classification module. In the classification module, the three different deep features are first fed into the fully-connected layers individually to reduce the dimension of the features. After that, the output features from the fully-connected layers are concatenated and fed into the fully-connected layer to predict the final output. To evaluate our proposed model, we use openly accessible brain MRI dataset from web. Experimental results show that our proposed model outperforms other machine learning-based models.
Dorjsembe, Uyanga;Lee, Ju Hong;Choi, Bumghi;Song, Jae Won
Proceedings of the Korea Information Processing Society Conference
/
2021.05a
/
pp.373-376
/
2021
Deep neural networks have achieved almost human-level results in various tasks and have become popular in the broad artificial intelligence domains. Uncertainty estimation is an on-demand task caused by the black-box point estimation behavior of deep learning. The deep ensemble provides increased accuracy and estimated uncertainty; however, linearly increasing the size makes the deep ensemble unfeasible for memory-intensive tasks. To address this problem, we used model pruning and quantization with a deep ensemble and analyzed the effect in the context of uncertainty metrics. We empirically showed that the ensemble members' disagreement increases with pruning, making models sparser by zeroing irrelevant parameters. Increased disagreement implies increased uncertainty, which helps in making more robust predictions. Accordingly, an energy-efficient compressed deep ensemble is appropriate for memory-intensive and uncertainty-aware tasks.
This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.
본 웹사이트에 게시된 이메일 주소가 전자우편 수집 프로그램이나
그 밖의 기술적 장치를 이용하여 무단으로 수집되는 것을 거부하며,
이를 위반시 정보통신망법에 의해 형사 처벌됨을 유념하시기 바랍니다.
[게시일 2004년 10월 1일]
이용약관
제 1 장 총칙
제 1 조 (목적)
이 이용약관은 KoreaScience 홈페이지(이하 “당 사이트”)에서 제공하는 인터넷 서비스(이하 '서비스')의 가입조건 및 이용에 관한 제반 사항과 기타 필요한 사항을 구체적으로 규정함을 목적으로 합니다.
제 2 조 (용어의 정의)
① "이용자"라 함은 당 사이트에 접속하여 이 약관에 따라 당 사이트가 제공하는 서비스를 받는 회원 및 비회원을
말합니다.
② "회원"이라 함은 서비스를 이용하기 위하여 당 사이트에 개인정보를 제공하여 아이디(ID)와 비밀번호를 부여
받은 자를 말합니다.
③ "회원 아이디(ID)"라 함은 회원의 식별 및 서비스 이용을 위하여 자신이 선정한 문자 및 숫자의 조합을
말합니다.
④ "비밀번호(패스워드)"라 함은 회원이 자신의 비밀보호를 위하여 선정한 문자 및 숫자의 조합을 말합니다.
제 3 조 (이용약관의 효력 및 변경)
① 이 약관은 당 사이트에 게시하거나 기타의 방법으로 회원에게 공지함으로써 효력이 발생합니다.
② 당 사이트는 이 약관을 개정할 경우에 적용일자 및 개정사유를 명시하여 현행 약관과 함께 당 사이트의
초기화면에 그 적용일자 7일 이전부터 적용일자 전일까지 공지합니다. 다만, 회원에게 불리하게 약관내용을
변경하는 경우에는 최소한 30일 이상의 사전 유예기간을 두고 공지합니다. 이 경우 당 사이트는 개정 전
내용과 개정 후 내용을 명확하게 비교하여 이용자가 알기 쉽도록 표시합니다.
제 4 조(약관 외 준칙)
① 이 약관은 당 사이트가 제공하는 서비스에 관한 이용안내와 함께 적용됩니다.
② 이 약관에 명시되지 아니한 사항은 관계법령의 규정이 적용됩니다.
제 2 장 이용계약의 체결
제 5 조 (이용계약의 성립 등)
① 이용계약은 이용고객이 당 사이트가 정한 약관에 「동의합니다」를 선택하고, 당 사이트가 정한
온라인신청양식을 작성하여 서비스 이용을 신청한 후, 당 사이트가 이를 승낙함으로써 성립합니다.
② 제1항의 승낙은 당 사이트가 제공하는 과학기술정보검색, 맞춤정보, 서지정보 등 다른 서비스의 이용승낙을
포함합니다.
제 6 조 (회원가입)
서비스를 이용하고자 하는 고객은 당 사이트에서 정한 회원가입양식에 개인정보를 기재하여 가입을 하여야 합니다.
제 7 조 (개인정보의 보호 및 사용)
당 사이트는 관계법령이 정하는 바에 따라 회원 등록정보를 포함한 회원의 개인정보를 보호하기 위해 노력합니다. 회원 개인정보의 보호 및 사용에 대해서는 관련법령 및 당 사이트의 개인정보 보호정책이 적용됩니다.
제 8 조 (이용 신청의 승낙과 제한)
① 당 사이트는 제6조의 규정에 의한 이용신청고객에 대하여 서비스 이용을 승낙합니다.
② 당 사이트는 아래사항에 해당하는 경우에 대해서 승낙하지 아니 합니다.
- 이용계약 신청서의 내용을 허위로 기재한 경우
- 기타 규정한 제반사항을 위반하며 신청하는 경우
제 9 조 (회원 ID 부여 및 변경 등)
① 당 사이트는 이용고객에 대하여 약관에 정하는 바에 따라 자신이 선정한 회원 ID를 부여합니다.
② 회원 ID는 원칙적으로 변경이 불가하며 부득이한 사유로 인하여 변경 하고자 하는 경우에는 해당 ID를
해지하고 재가입해야 합니다.
③ 기타 회원 개인정보 관리 및 변경 등에 관한 사항은 서비스별 안내에 정하는 바에 의합니다.
제 3 장 계약 당사자의 의무
제 10 조 (KISTI의 의무)
① 당 사이트는 이용고객이 희망한 서비스 제공 개시일에 특별한 사정이 없는 한 서비스를 이용할 수 있도록
하여야 합니다.
② 당 사이트는 개인정보 보호를 위해 보안시스템을 구축하며 개인정보 보호정책을 공시하고 준수합니다.
③ 당 사이트는 회원으로부터 제기되는 의견이나 불만이 정당하다고 객관적으로 인정될 경우에는 적절한 절차를
거쳐 즉시 처리하여야 합니다. 다만, 즉시 처리가 곤란한 경우는 회원에게 그 사유와 처리일정을 통보하여야
합니다.
제 11 조 (회원의 의무)
① 이용자는 회원가입 신청 또는 회원정보 변경 시 실명으로 모든 사항을 사실에 근거하여 작성하여야 하며,
허위 또는 타인의 정보를 등록할 경우 일체의 권리를 주장할 수 없습니다.
② 당 사이트가 관계법령 및 개인정보 보호정책에 의거하여 그 책임을 지는 경우를 제외하고 회원에게 부여된
ID의 비밀번호 관리소홀, 부정사용에 의하여 발생하는 모든 결과에 대한 책임은 회원에게 있습니다.
③ 회원은 당 사이트 및 제 3자의 지적 재산권을 침해해서는 안 됩니다.
제 4 장 서비스의 이용
제 12 조 (서비스 이용 시간)
① 서비스 이용은 당 사이트의 업무상 또는 기술상 특별한 지장이 없는 한 연중무휴, 1일 24시간 운영을
원칙으로 합니다. 단, 당 사이트는 시스템 정기점검, 증설 및 교체를 위해 당 사이트가 정한 날이나 시간에
서비스를 일시 중단할 수 있으며, 예정되어 있는 작업으로 인한 서비스 일시중단은 당 사이트 홈페이지를
통해 사전에 공지합니다.
② 당 사이트는 서비스를 특정범위로 분할하여 각 범위별로 이용가능시간을 별도로 지정할 수 있습니다. 다만
이 경우 그 내용을 공지합니다.
제 13 조 (홈페이지 저작권)
① NDSL에서 제공하는 모든 저작물의 저작권은 원저작자에게 있으며, KISTI는 복제/배포/전송권을 확보하고
있습니다.
② NDSL에서 제공하는 콘텐츠를 상업적 및 기타 영리목적으로 복제/배포/전송할 경우 사전에 KISTI의 허락을
받아야 합니다.
③ NDSL에서 제공하는 콘텐츠를 보도, 비평, 교육, 연구 등을 위하여 정당한 범위 안에서 공정한 관행에
합치되게 인용할 수 있습니다.
④ NDSL에서 제공하는 콘텐츠를 무단 복제, 전송, 배포 기타 저작권법에 위반되는 방법으로 이용할 경우
저작권법 제136조에 따라 5년 이하의 징역 또는 5천만 원 이하의 벌금에 처해질 수 있습니다.
제 14 조 (유료서비스)
① 당 사이트 및 협력기관이 정한 유료서비스(원문복사 등)는 별도로 정해진 바에 따르며, 변경사항은 시행 전에
당 사이트 홈페이지를 통하여 회원에게 공지합니다.
② 유료서비스를 이용하려는 회원은 정해진 요금체계에 따라 요금을 납부해야 합니다.
제 5 장 계약 해지 및 이용 제한
제 15 조 (계약 해지)
회원이 이용계약을 해지하고자 하는 때에는 [가입해지] 메뉴를 이용해 직접 해지해야 합니다.
제 16 조 (서비스 이용제한)
① 당 사이트는 회원이 서비스 이용내용에 있어서 본 약관 제 11조 내용을 위반하거나, 다음 각 호에 해당하는
경우 서비스 이용을 제한할 수 있습니다.
- 2년 이상 서비스를 이용한 적이 없는 경우
- 기타 정상적인 서비스 운영에 방해가 될 경우
② 상기 이용제한 규정에 따라 서비스를 이용하는 회원에게 서비스 이용에 대하여 별도 공지 없이 서비스 이용의
일시정지, 이용계약 해지 할 수 있습니다.
제 17 조 (전자우편주소 수집 금지)
회원은 전자우편주소 추출기 등을 이용하여 전자우편주소를 수집 또는 제3자에게 제공할 수 없습니다.
제 6 장 손해배상 및 기타사항
제 18 조 (손해배상)
당 사이트는 무료로 제공되는 서비스와 관련하여 회원에게 어떠한 손해가 발생하더라도 당 사이트가 고의 또는 과실로 인한 손해발생을 제외하고는 이에 대하여 책임을 부담하지 아니합니다.
제 19 조 (관할 법원)
서비스 이용으로 발생한 분쟁에 대해 소송이 제기되는 경우 민사 소송법상의 관할 법원에 제기합니다.
[부 칙]
1. (시행일) 이 약관은 2016년 9월 5일부터 적용되며, 종전 약관은 본 약관으로 대체되며, 개정된 약관의 적용일 이전 가입자도 개정된 약관의 적용을 받습니다.