• Title/Summary/Keyword: 기계학습(머신러닝)

Search Result 150, Processing Time 0.025 seconds

Artificial Intelligence and College Mathematics Education (인공지능(Artificial Intelligence)과 대학수학교육)

  • Lee, Sang-Gu;Lee, Jae Hwa;Ham, Yoonmee
    • Communications of Mathematical Education
    • /
    • v.34 no.1
    • /
    • pp.1-15
    • /
    • 2020
  • Today's healthcare, intelligent robots, smart home systems, and car sharing are already innovating with cutting-edge information and communication technologies such as Artificial Intelligence (AI), the Internet of Things, the Internet of Intelligent Things, and Big data. It is deeply affecting our lives. In the factory, robots have been working for humans more than several decades (FA, OA), AI doctors are also working in hospitals (Dr. Watson), AI speakers (Giga Genie) and AI assistants (Siri, Bixby, Google Assistant) are working to improve Natural Language Process. Now, in order to understand AI, knowledge of mathematics becomes essential, not a choice. Thus, mathematicians have been given a role in explaining such mathematics that make these things possible behind AI. Therefore, the authors wrote a textbook 'Basic Mathematics for Artificial Intelligence' by arranging the mathematics concepts and tools needed to understand AI and machine learning in one or two semesters, and organized lectures for undergraduate and graduate students of various majors to explore careers in artificial intelligence. In this paper, we share our experience of conducting this class with the full contents in http://matrix.skku.ac.kr/math4ai/.

Study on High-speed Cyber Penetration Attack Analysis Technology based on Static Feature Base Applicable to Endpoints (Endpoint에 적용 가능한 정적 feature 기반 고속의 사이버 침투공격 분석기술 연구)

  • Hwang, Jun-ho;Hwang, Seon-bin;Kim, Su-jeong;Lee, Tae-jin
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.21-31
    • /
    • 2018
  • Cyber penetration attacks can not only damage cyber space but can attack entire infrastructure such as electricity, gas, water, and nuclear power, which can cause enormous damage to the lives of the people. Also, cyber space has already been defined as the fifth battlefield, and strategic responses are very important. Most of recent cyber attacks are caused by malicious code, and since the number is more than 1.6 million per day, automated analysis technology to cope with a large amount of malicious code is very important. However, it is difficult to deal with malicious code encryption, obfuscation and packing, and the dynamic analysis technique is not limited to the performance requirements of dynamic analysis but also to the virtual There is a limit in coping with environment avoiding technology. In this paper, we propose a machine learning based malicious code analysis technique which improve the weakness of the detection performance of existing analysis technology while maintaining the light and high-speed analysis performance applicable to commercial endpoints. The results of this study show that 99.13% accuracy, 99.26% precision and 99.09% recall analysis performance of 71,000 normal file and malicious code in commercial environment and analysis time in PC environment can be analyzed more than 5 per second, and it can be operated independently in the endpoint environment and it is considered that it works in complementary form in operation in conjunction with existing antivirus technology and static and dynamic analysis technology. It is also expected to be used as a core element of EDR technology and malware variant analysis.

Modeling for Egg Price Prediction by Using Machine Learning (기계학습을 활용한 계란가격 예측 모델링)

  • Cho, Hohyun;Lee, Daekyeom;Chae, Yeonghun;Chang, Dongil
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.15-17
    • /
    • 2022
  • In the aftermath of the avian influenza that occurred from the second half of 2020 to the beginning of 2021, 17.8 million laying hens were slaughtered. Although the government invested more than 100 billion won for egg imports as a measure to stabilize prices, the effort was not that easy. The sharp volatility of egg prices negatively affected both consumers and poultry farmers, so measures were needed to stabilize egg prices. To this end, the egg prices were successfully predicted in this study by using the analysis algorithm of a machine learning regression. For price prediction, a total of 8 independent variables, including monthly broiler chicken production statistics for 2012-2021 of the Korean Poultry Association and the slaughter performance of the national statistics portal (kosis), have been selected to be used. The Root Mean Square Error (RMSE), which indicates the difference between the predicted price and the actual price, is at the level of 103 (won), which can be interpreted as explaining the egg prices relatively well predicted. Accurate prediction of egg prices lead to flexible adjustment of egg production weeks for laying hens, which can help decision-making about stocking of laying hens. This result is expected to help secure egg price stability.

  • PDF

Photovoltaic Generation Forecasting Using Weather Forecast and Predictive Sunshine and Radiation (일기 예보와 예측 일사 및 일조를 이용한 태양광 발전 예측)

  • Shin, Dong-Ha;Park, Jun-Ho;Kim, Chang-Bok
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.6
    • /
    • pp.643-650
    • /
    • 2017
  • Photovoltaic generation which has unlimited energy sources are very intermittent because they depend on the weather. Therefore, it is necessary to get accurate generation prediction with reducing the uncertainty of photovoltaic generation and improvement of the economics. The Meteorological Agency predicts weather factors for three days, but doesn't predict the sunshine and solar radiation that are most correlated with the prediction of photovoltaic generation. In this study, we predict sunshine and solar radiation using weather, precipitation, wind direction, wind speed, humidity, and cloudiness which is forecasted for three days at Meteorological Agency. The photovoltaic generation forecasting model is proposed by using predicted solar radiation and sunshine. As a result, the proposed model showed better results in the error rate indexes such as MAE, RMSE, and MAPE than the model that predicts photovoltaic generation without radiation and sunshine. In addition, DNN showed a lower error rate index than using SVM, which is a type of machine learning.

Comparative analysis of wavelet transform and machine learning approaches for noise reduction in water level data (웨이블릿 변환과 기계 학습 접근법을 이용한 수위 데이터의 노이즈 제거 비교 분석)

  • Hwang, Yukwan;Lim, Kyoung Jae;Kim, Jonggun;Shin, Minhwan;Park, Youn Shik;Shin, Yongchul;Ji, Bongjun
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.3
    • /
    • pp.209-223
    • /
    • 2024
  • In the context of the fourth industrial revolution, data-driven decision-making has increasingly become pivotal. However, the integrity of data analysis is compromised if data quality is not adequately ensured, potentially leading to biased interpretations. This is particularly critical for water level data, essential for water resource management, which often encounters quality issues such as missing values, spikes, and noise. This study addresses the challenge of noise-induced data quality deterioration, which complicates trend analysis and may produce anomalous outliers. To mitigate this issue, we propose a noise removal strategy employing Wavelet Transform, a technique renowned for its efficacy in signal processing and noise elimination. The advantage of Wavelet Transform lies in its operational efficiency - it reduces both time and costs as it obviates the need for acquiring the true values of collected data. This study conducted a comparative performance evaluation between our Wavelet Transform-based approach and the Denoising Autoencoder, a prominent machine learning method for noise reduction.. The findings demonstrate that the Coiflets wavelet function outperforms the Denoising Autoencoder across various metrics, including Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Squared Error (MSE). The superiority of the Coiflets function suggests that selecting an appropriate wavelet function tailored to the specific application environment can effectively address data quality issues caused by noise. This study underscores the potential of Wavelet Transform as a robust tool for enhancing the quality of water level data, thereby contributing to the reliability of water resource management decisions.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Comparison of Models for Stock Price Prediction Based on Keyword Search Volume According to the Social Acceptance of Artificial Intelligence (인공지능의 사회적 수용도에 따른 키워드 검색량 기반 주가예측모형 비교연구)

  • Cho, Yujung;Sohn, Kwonsang;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.103-128
    • /
    • 2021
  • Recently, investors' interest and the influence of stock-related information dissemination are being considered as significant factors that explain stock returns and volume. Besides, companies that develop, distribute, or utilize innovative new technologies such as artificial intelligence have a problem that it is difficult to accurately predict a company's future stock returns and volatility due to macro-environment and market uncertainty. Market uncertainty is recognized as an obstacle to the activation and spread of artificial intelligence technology, so research is needed to mitigate this. Hence, the purpose of this study is to propose a machine learning model that predicts the volatility of a company's stock price by using the internet search volume of artificial intelligence-related technology keywords as a measure of the interest of investors. To this end, for predicting the stock market, we using the VAR(Vector Auto Regression) and deep neural network LSTM (Long Short-Term Memory). And the stock price prediction performance using keyword search volume is compared according to the technology's social acceptance stage. In addition, we also conduct the analysis of sub-technology of artificial intelligence technology to examine the change in the search volume of detailed technology keywords according to the technology acceptance stage and the effect of interest in specific technology on the stock market forecast. To this end, in this study, the words artificial intelligence, deep learning, machine learning were selected as keywords. Next, we investigated how many keywords each week appeared in online documents for five years from January 1, 2015, to December 31, 2019. The stock price and transaction volume data of KOSDAQ listed companies were also collected and used for analysis. As a result, we found that the keyword search volume for artificial intelligence technology increased as the social acceptance of artificial intelligence technology increased. In particular, starting from AlphaGo Shock, the keyword search volume for artificial intelligence itself and detailed technologies such as machine learning and deep learning appeared to increase. Also, the keyword search volume for artificial intelligence technology increases as the social acceptance stage progresses. It showed high accuracy, and it was confirmed that the acceptance stages showing the best prediction performance were different for each keyword. As a result of stock price prediction based on keyword search volume for each social acceptance stage of artificial intelligence technologies classified in this study, the awareness stage's prediction accuracy was found to be the highest. The prediction accuracy was different according to the keywords used in the stock price prediction model for each social acceptance stage. Therefore, when constructing a stock price prediction model using technology keywords, it is necessary to consider social acceptance of the technology and sub-technology classification. The results of this study provide the following implications. First, to predict the return on investment for companies based on innovative technology, it is most important to capture the recognition stage in which public interest rapidly increases in social acceptance of the technology. Second, the change in keyword search volume and the accuracy of the prediction model varies according to the social acceptance of technology should be considered in developing a Decision Support System for investment such as the big data-based Robo-advisor recently introduced by the financial sector.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

A Study on Daytime Transparent Cloud Detection through Machine Learning: Using GK-2A/AMI (기계학습을 통한 주간 반투명 구름탐지 연구: GK-2A/AMI를 이용하여)

  • Byeon, Yugyeong;Jin, Donghyun;Seong, Noh-hun;Woo, Jongho;Jeon, Uujin;Han, Kyung-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1181-1189
    • /
    • 2022
  • Clouds are composed of tiny water droplets, ice crystals, or mixtures suspended in the atmosphere and cover about two-thirds of the Earth's surface. Cloud detection in satellite images is a very difficult task to separate clouds and non-cloud areas because of similar reflectance characteristics to some other ground objects or the ground surface. In contrast to thick clouds, which have distinct characteristics, thin transparent clouds have weak contrast between clouds and background in satellite images and appear mixed with the ground surface. In order to overcome the limitations of transparent clouds in cloud detection, this study conducted cloud detection focusing on transparent clouds using machine learning techniques (Random Forest [RF], Convolutional Neural Networks [CNN]). As reference data, Cloud Mask and Cirrus Mask were used in MOD35 data provided by MOderate Resolution Imaging Spectroradiometer (MODIS), and the pixel ratio of training data was configured to be about 1:1:1 for clouds, transparent clouds, and clear sky for model training considering transparent cloud pixels. As a result of the qualitative comparison of the study, bothRF and CNN successfully detected various types of clouds, including transparent clouds, and in the case of RF+CNN, which mixed the results of the RF model and the CNN model, the cloud detection was well performed, and was confirmed that the limitations of the model were improved. As a quantitative result of the study, the overall accuracy (OA) value of RF was 92%, CNN showed 94.11%, and RF+CNN showed 94.29% accuracy.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.