• Title/Summary/Keyword: Machine Learning

Search Result 5,492, Processing Time 0.027 seconds

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

Effects of mining activities on Nano-soil management using artificial intelligence models of ANN and ELM

  • Liu, Qi;Peng, Kang;Zeng, Jie;Marzouki, Riadh;Majdi, Ali;Jan, Amin;Salameh, Anas A.;Assilzadeh, Hamid
    • Advances in nano research
    • /
    • v.12 no.6
    • /
    • pp.549-566
    • /
    • 2022
  • Mining of ore minerals (sfalerite, cinnabar, and chalcopyrite) from the old mine has led in significant environmental effects as contamination of soils and plants and acidification of water. Also, nanoparticles (NP) have obtained global importance because of their widespread usage in daily life, unique properties, and rapid development in the field of nanotechnology. Regarding their usage in various fields, it is suggested that soil is the final environmental sink for NPs. Nanoparticles with excessive reactivity and deliverability may be carried out as amendments to enhance soil quality, mitigate soil contaminations, make certain secure land-software of the traditional change substances and enhance soil erosion control. Meanwhile, there's no record on the usage of Nano superior substances for mine soil reclamation. In this study, five soil specimens have been tested at 4 sites inside the region of mine (<100 m) to study zeolites, and iron sulfide nanoparticles. Also, through using Artificial Neural Network (ANN) and Extreme Learning Machine (ELM), this study has tried to appropriately estimate the mechanical properties of soil under the effect of these Nano particles. Considering the RMSE and R2 values, Zeolite Nano materials could enhance the mine soil fine through increasing the clay-silt fractions, increasing the water holding capacity, removing toxins and improving nutrient levels. Also, adding iron sulfide minerals to the soils would possibly exacerbate the soil acidity problems at a mining site.

Synthetic data augmentation for pixel-wise steel fatigue crack identification using fully convolutional networks

  • Zhai, Guanghao;Narazaki, Yasutaka;Wang, Shuo;Shajihan, Shaik Althaf V.;Spencer, Billie F. Jr.
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.237-250
    • /
    • 2022
  • Structural health monitoring (SHM) plays an important role in ensuring the safety and functionality of critical civil infrastructure. In recent years, numerous researchers have conducted studies to develop computer vision and machine learning techniques for SHM purposes, offering the potential to reduce the laborious nature and improve the effectiveness of field inspections. However, high-quality vision data from various types of damaged structures is relatively difficult to obtain, because of the rare occurrence of damaged structures. The lack of data is particularly acute for fatigue crack in steel bridge girder. As a result, the lack of data for training purposes is one of the main issues that hinders wider application of these powerful techniques for SHM. To address this problem, the use of synthetic data is proposed in this article to augment real-world datasets used for training neural networks that can identify fatigue cracks in steel structures. First, random textures representing the surface of steel structures with fatigue cracks are created and mapped onto a 3D graphics model. Subsequently, this model is used to generate synthetic images for various lighting conditions and camera angles. A fully convolutional network is then trained for two cases: (1) using only real-word data, and (2) using both synthetic and real-word data. By employing synthetic data augmentation in the training process, the crack identification performance of the neural network for the test dataset is seen to improve from 35% to 40% and 49% to 62% for intersection over union (IoU) and precision, respectively, demonstrating the efficacy of the proposed approach.

Optimised neural network prediction of interface bond strength for GFRP tendon reinforced cemented soil

  • Zhang, Genbao;Chen, Changfu;Zhang, Yuhao;Zhao, Hongchao;Wang, Yufei;Wang, Xiangyu
    • Geomechanics and Engineering
    • /
    • v.28 no.6
    • /
    • pp.599-611
    • /
    • 2022
  • Tendon reinforced cemented soil is applied extensively in foundation stabilisation and improvement, especially in areas with soft clay. To solve the deterioration problem led by steel corrosion, the glass fiber-reinforced polymer (GFRP) tendon is introduced to substitute the traditional steel tendon. The interface bond strength between the cemented soil matrix and GFRP tendon demonstrates the outstanding mechanical property of this composite. However, the lack of research between the influence factors and bond strength hinders the application. To evaluate these factors, back propagation neural network (BPNN) is applied to predict the relationship between them and bond strength. Since adjusting BPNN parameters is time-consuming and laborious, the particle swarm optimisation (PSO) algorithm is proposed. This study evaluated the influence of water content, cement content, curing time, and slip distance on the bond performance of GFRP tendon-reinforced cemented soils (GTRCS). The results showed that the ultimate and residual bond strengths were both in positive proportion to cement content and negative to water content. The sample cured for 28 days with 30% water content and 50% cement content had the largest ultimate strength (3879.40 kPa). The PSO-BPNN model was tuned with 3 neurons in the input layer, 10 in the hidden layer, and 1 in the output layer. It showed outstanding performance on a large database comprising 405 testing results. Its higher correlation coefficient (0.908) and lower root-mean-square error (239.11 kPa) were obtained compared to multiple linear regression (MLR) and logistic regression (LR). In addition, a sensitivity analysis was applied to acquire the ranking of the input variables. The results illustrated that the cement content performed the strongest influence on bond strength, followed by the water content and slip displacement.

Stacked Sparse Autoencoder-DeepCNN Model Trained on CICIDS2017 Dataset for Network Intrusion Detection (네트워크 침입 탐지를 위해 CICIDS2017 데이터셋으로 학습한 Stacked Sparse Autoencoder-DeepCNN 모델)

  • Lee, Jong-Hwa;Kim, Jong-Wouk;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.24 no.2
    • /
    • pp.24-34
    • /
    • 2021
  • Service providers using edge computing provide a high level of service. As a result, devices store important information in inner storage and have become a target of the latest cyberattacks, which are more difficult to detect. Although experts use a security system such as intrusion detection systems, the existing intrusion systems have low detection accuracy. Therefore, in this paper, we proposed a machine learning model for more accurate intrusion detections of devices in edge computing. The proposed model is a hybrid model that combines a stacked sparse autoencoder (SSAE) and a convolutional neural network (CNN) to extract important feature vectors from the input data using sparsity constraints. To find the optimal model, we compared and analyzed the performance as adjusting the sparsity coefficient of SSAE. As a result, the model showed the highest accuracy as a 96.9% using the sparsity constraints. Therefore, the model showed the highest performance when model trains only important features.

Data-driven prediction of compressive strength of FRP-confined concrete members: An application of machine learning models

  • Berradia, Mohammed;Azab, Marc;Ahmad, Zeeshan;Accouche, Oussama;Raza, Ali;Alashker, Yasser
    • Structural Engineering and Mechanics
    • /
    • v.83 no.4
    • /
    • pp.515-535
    • /
    • 2022
  • The strength models for fiber-reinforced polymer (FRP)-confined normal strength concrete (NC) cylinders available in the literature have been suggested based on small databases using limited variables of such structural members portraying less accuracy. The artificial neural network (ANN) is an advanced technique for precisely predicting the response of composite structures by considering a large number of parameters. The main objective of the present investigation is to develop an ANN model for the axial strength of FRP-confined NC cylinders using various parameters to give the highest accuracy of the predictions. To secure this aim, a large experimental database of 313 FRP-confined NC cylinders has been constructed from previous research investigations. An evaluation of 33 different empirical strength models has been performed using various statistical parameters (root mean squared error RMSE, mean absolute error MAE, and coefficient of determination R2) over the developed database. Then, a new ANN model using the Group Method of Data Handling (GMDH) has been proposed based on the experimental database that portrayed the highest performance as compared with the previous models with R2=0.92, RMSE=0.27, and MAE=0.33. Therefore, the suggested ANN model can accurately capture the axial strength of FRP-confined NC cylinders that can be used for the further analysis and design of such members in the construction industry.

GCNXSS: An Attack Detection Approach for Cross-Site Scripting Based on Graph Convolutional Networks

  • Pan, Hongyu;Fang, Yong;Huang, Cheng;Guo, Wenbo;Wan, Xuelin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.4008-4023
    • /
    • 2022
  • Since machine learning was introduced into cross-site scripting (XSS) attack detection, many researchers have conducted related studies and achieved significant results, such as saving time and labor costs by not maintaining a rule database, which is required by traditional XSS attack detection methods. However, this topic came across some problems, such as poor generalization ability, significant false negative rate (FNR) and false positive rate (FPR). Moreover, the automatic clustering property of graph convolutional networks (GCN) has attracted the attention of researchers. In the field of natural language process (NLP), the results of graph embedding based on GCN are automatically clustered in space without any training, which means that text data can be classified just by the embedding process based on GCN. Previously, other methods required training with the help of labeled data after embedding to complete data classification. With the help of the GCN auto-clustering feature and labeled data, this research proposes an approach to detect XSS attacks (called GCNXSS) to mine the dependencies between the units that constitute an XSS payload. First, GCNXSS transforms a URL into a word homogeneous graph based on word co-occurrence relationships. Then, GCNXSS inputs the graph into the GCN model for graph embedding and gets the classification results. Experimental results show that GCNXSS achieved successful results with accuracy, precision, recall, F1-score, FNR, FPR, and predicted time scores of 99.97%, 99.75%, 99.97%, 99.86%, 0.03%, 0.03%, and 0.0461ms. Compared with existing methods, GCNXSS has a lower FNR and FPR with stronger generalization ability.

The Case Study for Childcare Service Demand Forecasting Using Bigdata Reference Analysis Model (빅데이터 표준분석모델을 활용한 초등돌봄 수요예측 사례연구)

  • Yun, Chung-Sik;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.87-96
    • /
    • 2022
  • This paper is an empirical analysis as a reference model that can predict up to the maximum number of elementary school student care needs in local governments across the country. This study analyzed and predicted the characteristics of the region based on machine learning to predict the demand for elementary care in a new apartment complex. For this purpose, a total of 292 variables were used, including data related to apartment structure, such as number of parking spaces per household, and building-to-land ratio, environmental data around apartments such as distance to elementary schools, and population data of administrative districts. The use of various variables is of great significance, and it is meaningful in complex analysis. It is also an empirical case study that increased the reliability of the model through comparison with the actual value of the basic local government.

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

  • Hwang, Chulhyun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.179-190
    • /
    • 2022
  • Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.

Cryptocurrency Recommendation Model using the Similarity and Association Rule Mining (유사도와 연관규칙분석을 이용한 암호화폐 추천모형)

  • Kim, Yechan;Kim, Jinyoung;Kim, Chaerin;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.287-308
    • /
    • 2022
  • The explosive growth of cryptocurrency, led by Bitcoin has emerged as a major issue in the financial market recently. As a result, interest in cryptocurrency investment is increasing, but the market opens 24 hours and 365 days a year, price volatility, and exponentially increasing number of cryptocurrencies are provided as risks to cryptocurrency investors. For that reasons, It is raising the need for research to reduct investors' risks by dividing cryptocurrency which is not suitable for recommendation. Unlike the previous studies of maximizing returns by simply predicting the future of cryptocurrency prices or constructing cryptocurrency portfolios by focusing on returns, this paper reflects the tendencies of investors and presents an appropriate recommendation method with interpretation that can reduct investors' risks by selecting suitable Altcoins which are recommended using Apriori algorithm, one of the machine learning techniques, but based on the similarity and association rules of Bitocoin.