• Title/Summary/Keyword: Scikit-learn

Search Result 17, Processing Time 0.032 seconds

Clasification of Cyber Attack Group using Scikit Learn and Cyber Treat Datasets (싸이킷런과 사이버위협 데이터셋을 이용한 사이버 공격 그룹의 분류)

  • Kim, Kyungshin;Lee, Hojun;Kim, Sunghee;Kim, Byungik;Na, Wonshik;Kim, Donguk;Lee, Jeongwhan
    • Journal of Convergence for Information Technology
    • /
    • v.8 no.6
    • /
    • pp.165-171
    • /
    • 2018
  • The most threatening attack that has become a hot topic of recent IT security is APT Attack.. So far, there is no way to respond to APT attacks except by using artificial intelligence techniques. Here, we have implemented a machine learning algorithm for analyzing cyber threat data using machine learning method, using a data set that collects cyber attack cases using Scikit Learn, a big data machine learning framework. The result showed an attack classification accuracy close to 70%. This result can be developed into the algorithm of the security control system in the future.

Course recommendation system using deep learning (딥러닝을 이용한 강좌 추천시스템)

  • Min-Ah Lim;Seung-Yeon Hwang;Dong-Jin Shin;Jae-Kon Oh;Jeong-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.193-198
    • /
    • 2023
  • We study a learner-customized lecture recommendation project using deep learning. Recommendation systems can be easily found on the web and apps, and examples using this feature include recommending feature videos by clicking users and advertising items in areas of interest to users on SNS. In this study, the sentence similarity Word2Vec was mainly used to filter twice, and the course was recommended through the Surprise library. With this system, it provides users with the desired classification of course data conveniently and conveniently. Surprise Library is a Python scikit-learn-based library that is conveniently used in recommendation systems. By analyzing the data, the system is implemented at a high speed, and deeper learning is used to implement more precise results through course steps. When a user enters a keyword of interest, similarity between the keyword and the course title is executed, and similarity with the extracted video data and voice text is executed, and the highest ranking video data is recommended through the Surprise Library.

Performance Comparison Analysis of AI Supervised Learning Methods of Tensorflow and Scikit-Learn in the Writing Digit Data (필기숫자 데이터에 대한 텐서플로우와 사이킷런의 인공지능 지도학습 방식의 성능비교 분석)

  • Jo, Jun-Mo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.701-706
    • /
    • 2019
  • The advent of the AI(: Artificial Intelligence) has applied to many industrial and general applications have havingact on our lives these days. Various types of machine learning methods are supported in this field. The supervised learning method of the machine learning has features and targets as an input in the learning process. There are many supervised learning methods as well and their performance varies depends on the characteristics and states of the big data type as an input data. Therefore, in this paper, in order to compare the performance of the various supervised learning method with a specific big data set, the supervised learning methods supported in the Tensorflow and the Sckit-Learn are simulated and analyzed in the Jupyter Notebook environment with python.

Alarm program through image processing based on Machine Learning (ML 기반의 영상처리를 통한 알람 프로그램)

  • Kim, Deok-Min;Chung, Hyun-Woo;Park, Goo-Man
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • fall
    • /
    • pp.304-307
    • /
    • 2021
  • ML(machine learning) 기술을 활용하여 실용적인 측면에서 일반 사용자들이 바라보고 사용할 수 있도록 다양한 연구 개발이 이루어지고 있다. 특히 최근 개인 사용자의 personal computer와 mobile device의 processing unit의 연산 처리 속도가 두드러지게 빨라지고 있어 ML이 더 생활에 밀접해지고 있는 추세라고 볼 수 있다. 현재 ML시장에서 다양한 솔루션 및 어플리케이션을 제공하는 툴이나 라이브러리가 대거 공개되고 있는데 그 중에서도 Google에서 개발하여 배포한 'Mediapipe'를 사용하였다. Mediapipe는 현재 'android', 'IOS', 'C++', 'Python', 'JS', 'Coral' 등의 환경에서 개발을 지원하고 있으며 더욱 다양한 환경을 지원할 예정이다. 이에 본 팀은 앞서 설명한 Mediapipe 프레임워크를 기반으로 Machine Learning을 사용한 image processing를 통해 일반 사용자들에게 편의성을 제공할 수 있는 알람 프로그램을 연구 및 개발하였다. Mediapipe에서 신체를 landmark로 검출하게 되는데 이를 scikit-learn 머신러닝 라이브러리를 사용하여 특정 자세를 학습시키고 모델화하여 알람 프로그램에 특정 기능에 조건으로 사용될 수 있게 하였다. scikit-learn은 아나콘다 등과 같은 개발환경 패키지에서 간단하게 이용 가능한데 이 아나콘다는 데이터 분석이나 그래프 그리기 등, 파이썬에 자주 사용되는 라이브러리를 포함한 개발환경이라고 할 수 있다. 하여 본 팀은 ML기반의 영상처리 알람 프로그램을 제작하는데에 있어 이러한 사항들을 파이썬 환경에서 기본적으로 포함되어 제공하는 tkinter GUI툴을 사용하고 추가적으로 인텔에서 개발한 실시간 컴퓨터 비전을 목적으로 한 프로그래밍 라이브러리 OpenCV와 여러 항목을 사용하여 환경을 구축할 수 있도록 연구·개발하였다.

  • PDF

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Accuracy of Phishing Websites Detection Algorithms by Using Three Ranking Techniques

  • Mohammed, Badiea Abdulkarem;Al-Mekhlafi, Zeyad Ghaleb
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.272-282
    • /
    • 2022
  • Between 2014 and 2019, the US lost more than 2.1 billion USD to phishing attacks, according to the FBI's Internet Crime Complaint Center, and COVID-19 scam complaints totaled more than 1,200. Phishing attacks reflect these awful effects. Phishing websites (PWs) detection appear in the literature. Previous methods included maintaining a centralized blacklist that is manually updated, but newly created pseudonyms cannot be detected. Several recent studies utilized supervised machine learning (SML) algorithms and schemes to manipulate the PWs detection problem. URL extraction-based algorithms and schemes. These studies demonstrate that some classification algorithms are more effective on different data sets. However, for the phishing site detection problem, no widely known classifier has been developed. This study is aimed at identifying the features and schemes of SML that work best in the face of PWs across all publicly available phishing data sets. The Scikit Learn library has eight widely used classification algorithms configured for assessment on the public phishing datasets. Eight was tested. Later, classification algorithms were used to measure accuracy on three different datasets for statistically significant differences, along with the Welch t-test. Assemblies and neural networks outclass classical algorithms in this study. On three publicly accessible phishing datasets, eight traditional SML algorithms were evaluated, and the results were calculated in terms of classification accuracy and classifier ranking as shown in tables 4 and 8. Eventually, on severely unbalanced datasets, classifiers that obtained higher than 99.0 percent classification accuracy. Finally, the results show that this could also be adapted and outperforms conventional techniques with good precision.

Heart Disease Prediction Using Decision Tree With Kaggle Dataset

  • Noh, Young-Dan;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.5
    • /
    • pp.21-28
    • /
    • 2022
  • All health problems that occur in the circulatory system are refer to cardiovascular illness, such as heart and vascular diseases. Deaths from cardiovascular disorders are recorded one third of in total deaths in 2019 worldwide, and the number of deaths continues to rise. Therefore, if it is possible to predict diseases that has high mortality rate with patient's data and AI system, they would enable them to be detected and be treated in advance. In this study, models are produced to predict heart disease, which is one of the cardiovascular diseases, and compare the performance of models with Accuracy, Precision, and Recall, with description of the way of improving the performance of the Decision Tree(Decision Tree, KNN (K-Nearest Neighbor), SVM (Support Vector Machine), and DNN (Deep Neural Network) are used in this study.). Experiments were conducted using scikit-learn, Keras, and TensorFlow libraries using Python as Jupyter Notebook in macOS Big Sur. As a result of comparing the performance of the models, the Decision Tree demonstrates the highest performance, thus, it is recommended to use the Decision Tree in this study.

Income prediction of apple and pear farmers in Chungnam area by automatic machine learning with H2O.AI

  • Hyundong, Jang;Sounghun, Kim
    • Korean Journal of Agricultural Science
    • /
    • v.49 no.3
    • /
    • pp.619-627
    • /
    • 2022
  • In Korea, apples and pears are among the most important agricultural products to farmers who seek to earn money as income. Generally, farmers make decisions at various stages to maximize their income but they do not always know exactly which option will be the best one. Many previous studies were conducted to solve this problem by predicting farmers' income structure, but researchers are still exploring better approaches. Currently, machine learning technology is gaining attention as one of the new approaches for farmers' income prediction. The machine learning technique is a methodology using an algorithm that can learn independently through data. As the level of computer science develops, the performance of machine learning techniques is also improving. The purpose of this study is to predict the income structure of apples and pears using the automatic machine learning solution H2O.AI and to present some implications for apple and pear farmers. The automatic machine learning solution H2O.AI can save time and effort compared to the conventional machine learning techniques such as scikit-learn, because it works automatically to find the best solution. As a result of this research, the following findings are obtained. First, apple farmers should increase their gross income to maximize their income, instead of reducing the cost of growing apples. In particular, apple farmers mainly have to increase production in order to obtain more gross income. As a second-best option, apple farmers should decrease labor and other costs. Second, pear farmers also should increase their gross income to maximize their income but they have to increase the price of pears rather than increasing the production of pears. As a second-best option, pear farmers can decrease labor and other costs.

Development of Multilayer Perceptron Model for the Prediction of Alcohol Concentration of Makgeolli

  • Kim, JoonYong;Rho, Shin-Joung;Cho, Yun Sung;Cho, EunSun
    • Journal of Biosystems Engineering
    • /
    • v.43 no.3
    • /
    • pp.229-236
    • /
    • 2018
  • Purpose: Makgeolli is a traditional alcoholic beverage made from rice with a fermentation starter called "nuruk." The concentration of alcohol in makgeolli depends on the temperature of the fermentation tank. It is important to monitor the alcohol concentration to manage the makgeolli production process. Methods: Data were collected from 84 makgeolli fermentation tanks over a year period. Independent variables included the temperatures of the tanks and the room where the tanks were located, as well as the quantity, acidity, and water concentration of the source. Software for the multilayer perceptron model (MLP) was written in Python using the Scikit-learn library. Results: Many models were created for which the optimization converged within 100 iterations, and their coefficients of determination $R^2$ were considerably high. The coefficient of determination $R^2$ of the best model with the training set and the test set were 0.94 and 0.93, respectively. The fact that the difference between them was very small indicated that the model was not overfitted. The maximum and minimum error was approximately 2% and the total MSE was 0.078%. Conclusions: The MLP model could help predict the alcohol concentration and to control the production process of makgeolli. In future research, the optimization of the production process will be studied based on the model.

Emotion Recognition of Low Resource (Sindhi) Language Using Machine Learning

  • Ahmed, Tanveer;Memon, Sajjad Ali;Hussain, Saqib;Tanwani, Amer;Sadat, Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.369-376
    • /
    • 2021
  • One of the most active areas of research in the field of affective computing and signal processing is emotion recognition. This paper proposes emotion recognition of low-resource (Sindhi) language. This work's uniqueness is that it examines the emotions of languages for which there is currently no publicly accessible dataset. The proposed effort has provided a dataset named MAVDESS (Mehran Audio-Visual Dataset Mehran Audio-Visual Database of Emotional Speech in Sindhi) for the academic community of a significant Sindhi language that is mainly spoken in Pakistan; however, no generic data for such languages is accessible in machine learning except few. Furthermore, the analysis of various emotions of Sindhi language in MAVDESS has been carried out to annotate the emotions using line features such as pitch, volume, and base, as well as toolkits such as OpenSmile, Scikit-Learn, and some important classification schemes such as LR, SVC, DT, and KNN, which will be further classified and computed to the machine via Python language for training a machine. Meanwhile, the dataset can be accessed in future via https://doi.org/10.5281/zenodo.5213073.