• Title/Summary/Keyword: machine learning classification models

Search Result 364, Processing Time 0.024 seconds

A study on evaluation method of NIDS datasets in closed military network (군 폐쇄망 환경에서의 모의 네트워크 데이터 셋 평가 방법 연구)

  • Park, Yong-bin;Shin, Sung-uk;Lee, In-sup
    • Journal of Internet Computing and Services
    • /
    • v.21 no.2
    • /
    • pp.121-130
    • /
    • 2020
  • This paper suggests evaluating the military closed network data as an image which is generated by Generative Adversarial Network (GAN), applying an image evaluation method such as the InceptionV3 model-based Inception Score (IS) and Frechet Inception Distance (FID). We employed the famous image classification models instead of the InceptionV3, added layers to those models, and converted the network data to an image in diverse ways. Experimental results show that the Densenet121 model with one added Dense Layer achieves the best performance in data converted using the arctangent algorithm and 8 * 8 size of the image.

Optimization-Based Pattern Generation for LAD (최적화에 기반을 둔 LAD의 패턴 생성 기법)

  • Jang, In-Yong;Ryoo, Hong-Seo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.1 s.39
    • /
    • pp.11-18
    • /
    • 2006
  • The logical analysis of data(LAD) is a Boolean-logic based data mining tool. A critical step in analyzing data by LAD is the pattern generation stage where useful knowledge and hidden structural information in data is discovered in the form of patterns. A conventional method for pattern generation in LAD is based on term enumeration that renders the generation of higher degree patterns practically impossible. In this paper, we present a novel optimization-based pattern generation methodology and propose two mathematical programming models, a mixed 0-1 integer and linear programming (MILP) formulation and a well-studied set covering problem (SCP) formulation for the generation of optimal and heuristic patterns, respectively. With benchmark datasets, we demonstrate the effectiveness of our models by automatically generating with ease patterns of high complexity that cannot be generated with the conventional approach.

  • PDF

SVM on Top of Deep Networks for Covid-19 Detection from Chest X-ray Images

  • Do, Thanh-Nghi;Le, Van-Thanh;Doan, Thi-Huong
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.3
    • /
    • pp.219-225
    • /
    • 2022
  • In this study, we propose training a support vector machine (SVM) model on top of deep networks for detecting Covid-19 from chest X-ray images. We started by gathering a real chest X-ray image dataset, including positive Covid-19, normal cases, and other lung diseases not caused by Covid-19. Instead of training deep networks from scratch, we fine-tuned recent pre-trained deep network models, such as DenseNet121, MobileNet v2, Inception v3, Xception, ResNet50, VGG16, and VGG19, to classify chest X-ray images into one of three classes (Covid-19, normal, and other lung). We propose training an SVM model on top of deep networks to perform a nonlinear combination of deep network outputs, improving classification over any single deep network. The empirical test results on the real chest X-ray image dataset show that deep network models, with an exception of ResNet50 with 82.44%, provide an accuracy of at least 92% on the test set. The proposed SVM on top of the deep network achieved the highest accuracy of 96.16%.

Indoor positioning system using Xgboosting (Xgboosting 기법을 이용한 실내 위치 측위 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Kim, Dae-Jin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.492-494
    • /
    • 2021
  • The decision tree technique is used as a classification technique in machine learning. However, the decision tree has a problem of consuming a lot of speed or resources due to the problem of overfitting. To solve this problem, there are bagging and boosting techniques. Bagging creates multiple samplings and models them using them, and boosting models the sampled data and adjusts weights to reduce overfitting. In addition, recently, techniques Xgboost have been introduced to improve performance. Therefore, in this paper, we collect wifi signal data for indoor positioning, apply it to the existing method and Xgboost, and perform performance evaluation through it.

  • PDF

Integrating a Machine Learning-based Space Classification Model with an Automated Interior Finishing System in BIM Models

  • Ha, Daemok;Yu, Youngsu;Choi, Jiwon;Kim, Sihyun;Koo, Bonsang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.24 no.4
    • /
    • pp.60-73
    • /
    • 2023
  • The need for adopting automation technologies to improve inefficiencies in interior finishing modeling work is increasing during the Building Information Modeling (BIM) design stage. As a result, the use of visual programming languages (VPL) for practical applications is growing. However, undefined or incorrect space designations in BIM models can hinder the development of automated finishing modeling processes, resulting in erroneous corrections and rework. To address this challenge, this study first developed a rule-based automated interior finishing detailing module for floors, walls, and ceilings. In addition, an automated space integrity checking module with 86.69% ACC using the Multi-Layer Perceptron (MLP) model was developed. These modules were integrated into a design automation module for interior finishing, which was then verified for practical utility. The results showed that the automation module reduced the time required for modeling and integrity checking by 97.6% compared to manual work, confirming its utility in assisting BIM model development for interior finishing works.

Parallel Network Model of Abnormal Respiratory Sound Classification with Stacking Ensemble

  • Nam, Myung-woo;Choi, Young-Jin;Choi, Hoe-Ryeon;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.21-31
    • /
    • 2021
  • As the COVID-19 pandemic rapidly changes healthcare around the globe, the need for smart healthcare that allows for remote diagnosis is increasing. The current classification of respiratory diseases cost high and requires a face-to-face visit with a skilled medical professional, thus the pandemic significantly hinders monitoring and early diagnosis. Therefore, the ability to accurately classify and diagnose respiratory sound using deep learning-based AI models is essential to modern medicine as a remote alternative to the current stethoscope. In this study, we propose a deep learning-based respiratory sound classification model using data collected from medical experts. The sound data were preprocessed with BandPassFilter, and the relevant respiratory audio features were extracted with Log-Mel Spectrogram and Mel Frequency Cepstral Coefficient (MFCC). Subsequently, a Parallel CNN network model was trained on these two inputs using stacking ensemble techniques combined with various machine learning classifiers to efficiently classify and detect abnormal respiratory sounds with high accuracy. The model proposed in this paper classified abnormal respiratory sounds with an accuracy of 96.9%, which is approximately 6.1% higher than the classification accuracy of baseline model.

Evaluating the prediction models of leaf wetness duration for citrus orchards in Jeju, South Korea (제주 감귤 과수원에서의 이슬지속시간 예측 모델 평가)

  • Park, Jun Sang;Seo, Yun Am;Kim, Kyu Rang;Ha, Jong-Chul
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.3
    • /
    • pp.262-276
    • /
    • 2018
  • Models to predict Leaf Wetness Duration (LWD) were evaluated using the observed meteorological and dew data at the 11 citrus orchards in Jeju, South Korea from 2016 to 2017. The sensitivity and the prediction accuracy were evaluated with four models (i.e., Number of Hours of Relative Humidity (NHRH), Classification And Regression Tree/Stepwise Linear Discriminant (CART/SLD), Penman-Monteith (PM), Deep-learning Neural Network (DNN)). The sensitivity of models was evaluated with rainfall and seasonal changes. When the data in rainy days were excluded from the whole data set, the LWD models had smaller average error (Root Mean Square Error (RMSE) about 1.5hours). The seasonal error of the DNN model had the similar magnitude (RMSE about 3 hours) among all seasons excluding winter. The other models had the greatest error in summer (RMSE about 9.6 hours) and the lowest error in winter (RMSE about 3.3 hours). These models were also evaluated by the statistical error analysis method and the regression analysis method of mean squared deviation. The DNN model had the best performance by statistical error whereas the CART/SLD model had the worst prediction accuracy. The Mean Square Deviation (MSD) is a method of analyzing the linearity of a model with three components: squared bias (SB), nonunity slope (NU), and lack of correlation (LC). Better model performance was determined by lower SB and LC and higher NU. The results of MSD analysis indicated that the DNN model would provide the best performance and followed by the PM, the NHRH and the CART/SLD in order. This result suggested that the machine learning model would be useful to improve the accuracy of agricultural information using meteorological data.

Comparative analysis of Machine-Learning Based Models for Metal Surface Defect Detection (머신러닝 기반 금속외관 결함 검출 비교 분석)

  • Lee, Se-Hun;Kang, Seong-Hwan;Shin, Yo-Seob;Choi, Oh-Kyu;Kim, Sijong;Kang, Jae-Mo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.834-841
    • /
    • 2022
  • Recently, applying artificial intelligence technologies in various fields of production has drawn an upsurge of research interest due to the increase for smart factory and artificial intelligence technologies. A great deal of effort is being made to introduce artificial intelligence algorithms into the defect detection task. Particularly, detection of defects on the surface of metal has a higher level of research interest compared to other materials (wood, plastics, fibers, etc.). In this paper, we compare and analyze the speed and performance of defect classification by combining machine learning techniques (Support Vector Machine, Softmax Regression, Decision Tree) with dimensionality reduction algorithms (Principal Component Analysis, AutoEncoders) and two convolutional neural networks (proposed method, ResNet). To validate and compare the performance and speed of the algorithms, we have adopted two datasets ((i) public dataset, (ii) actual dataset), and on the basis of the results, the most efficient algorithm is determined.

A Classification Model for Attack Mail Detection based on the Authorship Analysis (작성자 분석 기반의 공격 메일 탐지를 위한 분류 모델)

  • Hong, Sung-Sam;Shin, Gun-Yoon;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.35-46
    • /
    • 2017
  • Recently, attackers using malicious code in cyber security have been increased by attaching malicious code to a mail and inducing the user to execute it. Especially, it is dangerous because it is easy to execute by attaching a document type file. The author analysis is a research area that is being studied in NLP (Neutral Language Process) and text mining, and it studies methods of analyzing authors by analyzing text sentences, texts, and documents in a specific language. In case of attack mail, it is created by the attacker. Therefore, by analyzing the contents of the mail and the attached document file and identifying the corresponding author, it is possible to discover more distinctive features from the normal mail and improve the detection accuracy. In this pager, we proposed IADA2(Intelligent Attack mail Detection based on Authorship Analysis) model for attack mail detection. The feature vector that can classify and detect attack mail from the features used in the existing machine learning based spam detection model and the features used in the author analysis of the document and the IADA2 detection model. We have improved the detection models of attack mails by simply detecting term features and extracted features that reflect the sequence characteristics of words by applying n-grams. Result of experiment show that the proposed method improves performance according to feature combinations, feature selection techniques, and appropriate models.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.