• Title/Summary/Keyword: Machine Learning Algorithm

Search Result 1,508, Processing Time 0.025 seconds

Classifying Windows Executables using API-based Information and Machine Learning (API 정보와 기계학습을 통한 윈도우 실행파일 분류)

  • Cho, DaeHee;Lim, Kyeonghwan;Cho, Seong-je;Han, Sangchul;Hwang, Young-sup
    • Journal of KIISE
    • /
    • v.43 no.12
    • /
    • pp.1325-1333
    • /
    • 2016
  • Software classification has several applications such as copyright infringement detection, malware classification, and software automatic categorization in software repositories. It can be also employed by software filtering systems to prevent the transmission of illegal software. If illegal software is identified by measuring software similarity in software filtering systems, the average number of comparisons can be reduced by shrinking the search space. In this study, we focused on the classification of Windows executables using API call information and machine learning. We evaluated the classification performance of machine learning-based classifier according to the refinement method for API information and machine learning algorithm. The results showed that the classification success rate of SVM (Support Vector Machine) with PolyKernel was higher than other algorithms. Since the API call information can be extracted from binary executables and machine learning-based classifier can identify tampered executables, API call information and machine learning-based software classifiers are suitable for software filtering systems.

Dropout Genetic Algorithm Analysis for Deep Learning Generalization Error Minimization

  • Park, Jae-Gyun;Choi, Eun-Soo;Kang, Min-Soo;Jung, Yong-Gyu
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.2
    • /
    • pp.74-81
    • /
    • 2017
  • Recently, there are many companies that use systems based on artificial intelligence. The accuracy of artificial intelligence depends on the amount of learning data and the appropriate algorithm. However, it is not easy to obtain learning data with a large number of entity. Less data set have large generalization errors due to overfitting. In order to minimize this generalization error, this study proposed DGA(Dropout Genetic Algorithm) which can expect relatively high accuracy even though data with a less data set is applied to machine learning based genetic algorithm to deep learning based dropout. The idea of this paper is to determine the active state of the nodes. Using Gradient about loss function, A new fitness function is defined. Proposed Algorithm DGA is supplementing stochastic inconsistency about Dropout. Also DGA solved problem by the complexity of the fitness function and expression range of the model about Genetic Algorithm As a result of experiments using MNIST data proposed algorithm accuracy is 75.3%. Using only Dropout algorithm accuracy is 41.4%. It is shown that DGA is better than using only dropout.

A Bottle Recognition and Classification Algorithm for Deposit Refund (병 인식 및 보증금 환불을 위한 분류 알고리즘)

  • Jeong, Pil-seong;Cho, Yang-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.9
    • /
    • pp.1744-1751
    • /
    • 2017
  • We are striving to strengthen environmental regulations and reduce household waste in all countries around the world. Korea is also striving for the circulation of energy resources by enacting laws to promote resource saving and recycling. The government has implemented an empty bottle deposit system for the recycling of empty bottles, but there is a limit to the collection through manpower and the reverse vending machine is not localized. In this paper, we propose a recyclable bottle recognition and classification algorithm which is essential in the reverser vending machine to promote energy resource circulation. The proposed algorithm is a complex identification algorithm using OpenCV and CNN(Convolution Neural Network). In order to evaluate the effectiveness of the proposed algorithm, we implement a classification system that operates in an reverse vending machine, so that it can easily acquire information about bottles and reverse vending machine in various devices.

Impact parameter prediction of a simulated metallic loose part using convolutional neural network

  • Moon, Seongin;Han, Seongjin;Kang, To;Han, Soonwoo;Kim, Kyungmo;Yu, Yongkyun;Eom, Joseph
    • Nuclear Engineering and Technology
    • /
    • v.53 no.4
    • /
    • pp.1199-1209
    • /
    • 2021
  • The detection of unexpected loose parts in the primary coolant system in a nuclear power plant remains an extremely important issue. It is essential to develop a methodology for the localization and mass estimation of loose parts owing to the high prediction error of conventional methods. An effective approach is presented for the localization and mass estimation of a loose part using machine-learning and deep-learning algorithms. First, a methodology was developed to estimate both the impact location and the mass of a loose part at the same times in a real structure in which geometric changes exist. Second, an impact database was constructed through a series of impact finite-element analyses (FEAs). Then, impact parameter prediction modes were generated for localization and mass estimation of a simulated metallic loose part using machine-learning algorithms (artificial neural network, Gaussian process, and support vector machine) and a deep-learning algorithm (convolutional neural network). The usefulness of the methodology was validated through blind tests, and the noise effect of the training data was also investigated. The high performance obtained in this study shows that the proposed methodology using an FEA-based database and deep learning is useful for localization and mass estimation of loose parts on site.

A Study on the Development of DGA based on Deep Learning (Deep Learning 기반의 DGA 개발에 대한 연구)

  • Park, Jae-Gyun;Choi, Eun-Soo;Kim, Byung-June;Zhang, Pan
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.18-28
    • /
    • 2017
  • Recently, there are many companies that use systems based on artificial intelligence. The accuracy of artificial intelligence depends on the amount of learning data and the appropriate algorithm. However, it is not easy to obtain learning data with a large number of entity. Less data set have large generalization errors due to overfitting. In order to minimize this generalization error, this study proposed DGA which can expect relatively high accuracy even though data with a less data set is applied to machine learning based genetic algorithm to deep learning based dropout. The idea of this paper is to determine the active state of the nodes. Using Gradient about loss function, A new fitness function is defined. Proposed Algorithm DGA is supplementing stochastic inconsistency about Dropout. Also DGA solved problem by the complexity of the fitness function and expression range of the model about Genetic Algorithm As a result of experiments using MNIST data proposed algorithm accuracy is 75.3%. Using only Dropout algorithm accuracy is 41.4%. It is shown that DGA is better than using only dropout.

An Improvement of AdaBoost using Boundary Classifier

  • Lee, Wonju;Cheon, Minkyu;Hyun, Chang-Ho;Park, Mignon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.2
    • /
    • pp.166-171
    • /
    • 2013
  • The method proposed in this paper can improve the performance of the Boosting algorithm in machine learning. The proposed Boundary AdaBoost algorithm can make up for the weak points of Normal binary classifier using threshold boundary concepts. The new proposed boundary can be located near the threshold of the binary classifier. The proposed algorithm improves classification in areas where Normal binary classifier is weak. Thus, the optimal boundary final classifier can decrease error rates classified with more reasonable features. Finally, this paper derives the new algorithm's optimal solution, and it demonstrates how classifier accuracy can be improved using the proposed Boundary AdaBoost in a simulation experiment of pedestrian detection using 10-fold cross validation.

Speed-up of the Matrix Computation on the Ridge Regression

  • Lee, Woochan;Kim, Moonseong;Park, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3482-3497
    • /
    • 2021
  • Artificial intelligence has emerged as the core of the 4th industrial revolution, and large amounts of data processing, such as big data technology and rapid data analysis, are inevitable. The most fundamental and universal data interpretation technique is an analysis of information through regression, which is also the basis of machine learning. Ridge regression is a technique of regression that decreases sensitivity to unique or outlier information. The time-consuming calculation portion of the matrix computation, however, basically includes the introduction of an inverse matrix. As the size of the matrix expands, the matrix solution method becomes a major challenge. In this paper, a new algorithm is introduced to enhance the speed of ridge regression estimator calculation through series expansion and computation recycle without adopting an inverse matrix in the calculation process or other factorization methods. In addition, the performances of the proposed algorithm and the existing algorithm were compared according to the matrix size. Overall, excellent speed-up of the proposed algorithm with good accuracy was demonstrated.

Machine Learning-based Screening Algorithm for Energy Storage System Using Retired Lithium-ion Batteries (에너지 저장 시스템 적용을 위한 머신러닝 기반의 폐배터리 스크리닝 알고리즘)

  • Han, Eui-Seong;Lim, Je-Yeong;Lee, Hyeon-Ho;Kim, Dong-Hwan;Noh, Tae-Won;Lee, Byoung-Kuk
    • The Transactions of the Korean Institute of Power Electronics
    • /
    • v.27 no.3
    • /
    • pp.265-274
    • /
    • 2022
  • This paper proposes a machine learning-based screening algorithm to build the retired battery pack of the energy storage system. The proposed algorithm creates the dataset of various performance parameters of the retired battery, and this dataset is preprocessed through a principal component analysis to reduce the overfitting problem. The retried batteries with a large deviation are excluded in the dataset through a density-based spatial clustering of applications with noise, and the K-means clustering method is formulated to select the group of the retired batteries to satisfy the deviation requirement conditions. The performance of the proposed algorithm is verified based on NASA and Oxford datasets.

Fault Prognostics of a SMPS based on PCA-SVM (PCA-SVM 기반의 SMPS 고장예지에 관한 연구)

  • Yoo, Yeon-Su;Kim, Dong-Hyeon;Kim, Seol;Hur, Jang-Wook
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.19 no.9
    • /
    • pp.47-52
    • /
    • 2020
  • With the 4th industrial revolution, condition monitoring using machine learning techniques has become popular among researchers. An overload due to complex operations causes several irregularities in MOSFETs. This study investigated the acquired voltage to analyze the overcurrent effects on MOSFETs using a failure mode effect analysis (FMEA). The results indicated that the voltage pattern changes greatly when the current is beyond the threshold value. Several features were extracted from the collected voltage signals that indicate the health state of a switched-mode power supply (SMPS). Then, the data were reduced to a smaller sample space by using a principal component analysis (PCA). A robust machine learning algorithm, the support vector machine (SVM), was used to classify different health states of an SMPS, and the classification results are presented for different parameters. An SVM approach assisted by a PCA algorithm provides a strong fault diagnosis framework for an SMPS.

A Study on the Comparison of Predictive Models of Cardiovascular Disease Incidence Based on Machine Learning

  • Ji Woo SEOK;Won ro LEE;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.1
    • /
    • pp.1-7
    • /
    • 2023
  • In this paper, a study was conducted to compare the prediction model of cardiovascular disease occurrence. It is the No.1 disease that accounts for 1/3 of the world's causes of death, and it is also the No. 2 cause of death in Korea. Primary prevention is the most important factor in preventing cardiovascular diseases before they occur. Early diagnosis and treatment are also more important, as they play a role in reducing mortality and morbidity. The Results of an experiment using Azure ML, Logistic Regression showed 88.6% accuracy, Decision Tree showed 86.4% accuracy, and Support Vector Machine (SVM) showed 83.7% accuracy. In addition to the accuracy of the ROC curve, AUC is 94.5%, 93%, and 92.4%, indicating that the performance of the machine learning algorithm model is suitable, and among them, the results of applying the logistic regression algorithm model are the most accurate. Through this paper, visualization by comparing the algorithms can serve as an objective assistant for diagnosis and guide the direction of diagnosis made by doctors in the actual medical field.