• Title/Summary/Keyword: Machine learning (ML)

Search Result 290, Processing Time 0.029 seconds

Improvement of Underground Cavity and Structure Detection Performance Through Machine Learning-based Diffraction Separation of GPR Data (기계학습 기반 회절파 분리 적용을 통한 GPR 탐사 자료의 도로 하부 공동 및 구조물 탐지 성능 향상)

  • Sooyoon Kim;Joongmoo Byun
    • Geophysics and Geophysical Exploration
    • /
    • v.26 no.4
    • /
    • pp.171-184
    • /
    • 2023
  • Machine learning (ML)-based cavity detection using a large amount of survey data obtained from vehicle-mounted ground penetrating radar (GPR) has been actively studied to identify underground cavities. However, only simple image processing techniques have been used for preprocessing the ML input, and many conventional seismic and GPR data processing techniques, which have been used for decades, have not been fully exploited. In this study, based on the idea that a cavity can be identified using diffraction, we applied ML-based diffraction separation to GPR data to increase the accuracy of cavity detection using the YOLO v5 model. The original ML-based seismic diffraction separation technique was modified, and the separated diffraction image was used as the input to train the cavity detection model. The performance of the proposed method was verified using public GPR data released by the Seoul Metropolitan Government. Underground cavities and objects were more accurately detected using separated diffraction images. In the future, the proposed method can be useful in various fields in which GPR surveys are used.

AutoML Machine Learning-Based for Detecting Qshing Attacks Malicious URL Classification Technology Research and Service Implementation (큐싱 공격 탐지를 위한 AutoML 머신러닝 기반 악성 URL 분류 기술 연구 및 서비스 구현)

  • Dong-Young Kim;Gi-Seong Hwang
    • Smart Media Journal
    • /
    • v.13 no.6
    • /
    • pp.9-15
    • /
    • 2024
  • In recent trends, there has been an increase in 'Qshing' attacks, a hybrid form of phishing that exploits fake QR (Quick Response) codes impersonating government agencies to steal personal and financial information. Particularly, this attack method is characterized by its stealthiness, as victims can be redirected to phishing pages or led to download malicious software simply by scanning a QR code, making it difficult for them to realize they have been targeted. In this paper, we have developed a classification technique utilizing machine learning algorithms to identify the maliciousness of URLs embedded in QR codes, and we have explored ways to integrate this with existing QR code readers. To this end, we constructed a dataset from 128,587 malicious URLs and 428,102 benign URLs, extracting 35 different features such as protocol and parameters, and used AutoML to identify the optimal algorithm and hyperparameters, achieving an accuracy of approximately 87.37%. Following this, we designed the integration of the trained classification model with existing QR code readers to implement a service capable of countering Qshing attacks. In conclusion, our findings confirm that deriving an optimized algorithm for classifying malicious URLs in QR codes and integrating it with existing QR code readers presents a viable solution to combat Qshing attacks.

Exploiting Korean Language Model to Improve Korean Voice Phishing Detection (한국어 언어 모델을 활용한 보이스피싱 탐지 기능 개선)

  • Boussougou, Milandu Keith Moussavou;Park, Dong-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.437-446
    • /
    • 2022
  • Text classification task from Natural Language Processing (NLP) combined with state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms as the core engine is widely used to detect and classify voice phishing call transcripts. While numerous studies on the classification of voice phishing call transcripts are being conducted and demonstrated good performances, with the increase of non-face-to-face financial transactions, there is still the need for improvement using the latest NLP technologies. This paper conducts a benchmarking of Korean voice phishing detection performances of the pre-trained Korean language model KoBERT, against multiple other SOTA algorithms based on the classification of related transcripts from the labeled Korean voice phishing dataset called KorCCVi. The results of the experiments reveal that the classification accuracy on a test set of the KoBERT model outperforms the performances of all other models with an accuracy score of 99.60%.

MLOps workflow language and platform for time series data anomaly detection

  • Sohn, Jung-Mo;Kim, Su-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.19-27
    • /
    • 2022
  • In this study, we propose a language and platform to describe and manage the MLOps(Machine Learning Operations) workflow for time series data anomaly detection. Time series data is collected in many fields, such as IoT sensors, system performance indicators, and user access. In addition, it is used in many applications such as system monitoring and anomaly detection. In order to perform prediction and anomaly detection of time series data, the MLOps platform that can quickly and flexibly apply the analyzed model to the production environment is required. Thus, we developed Python-based AI/ML Modeling Language (AMML) to easily configure and execute MLOps workflows. Python is widely used in data analysis. The proposed MLOps platform can extract and preprocess time series data from various data sources (R-DB, NoSql DB, Log File, etc.) using AMML and predict it through a deep learning model. To verify the applicability of AMML, the workflow for generating a transformer oil temperature prediction deep learning model was configured with AMML and it was confirmed that the training was performed normally.

Comparative Study of Data Preprocessing and ML&DL Model Combination for Daily Dam Inflow Prediction (댐 일유입량 예측을 위한 데이터 전처리와 머신러닝&딥러닝 모델 조합의 비교연구)

  • Youngsik Jo;Kwansue Jung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.358-358
    • /
    • 2023
  • 본 연구에서는 그동안 수자원분야 강우유출 해석분야에 활용되었던 대표적인 머신러닝&딥러닝(ML&DL) 모델을 활용하여 모델의 하이퍼파라미터 튜닝뿐만 아니라 모델의 특성을 고려한 기상 및 수문데이터의 조합과 전처리(lag-time, 이동평균 등)를 통하여 데이터 특성과 ML&DL모델의 조합시나리오에 따른 일 유입량 예측성능을 비교 검토하는 연구를 수행하였다. 이를 위해 소양강댐 유역을 대상으로 1974년에서 2021년까지 축적된 기상 및 수문데이터를 활용하여 1) 강우, 2) 유입량, 3) 기상자료를 주요 영향변수(독립변수)로 고려하고, 이에 a) 지체시간(lag-time), b) 이동평균, c) 유입량의 성분분리조건을 적용하여 총 36가지 시나리오 조합을 ML&DL의 입력자료로 활용하였다. ML&DL 모델은 1) Linear Regression(LR), 2) Lasso, 3) Ridge, 4) SVR(Support Vector Regression), 5) Random Forest(RF), 6) LGBM(Light Gradient Boosting Model), 7) XGBoost의 7가지 ML방법과 8) LSTM(Long Short-Term Memory models), 9) TCN(Temporal Convolutional Network), 10) LSTM-TCN의 3가지 DL 방법, 총 10가지 ML&DL모델을 비교 검토하여 일유입량 예측을 위한 가장 적합한 데이터 조합 특성과 ML&DL모델을 성능평가와 함께 제시하였다. 학습된 모형의 유입량 예측 결과를 비교·분석한 결과, 소양강댐 유역에서는 딥러닝 중에서는 TCN모형이 가장 우수한 성능을 보였고(TCN>TCN-LSTM>LSTM), 트리기반 머신러닝중에서는 Random Forest와 LGBM이 우수한 성능을 보였으며(RF, LGBM>XGB), SVR도 LGBM수준의 우수한 성능을 나타내었다. LR, Lasso, Ridge 세가지 Regression모형은 상대적으로 낮은 성능을 보였다. 또한 소양강댐 댐유입량 예측에 대하여 강우, 유입량, 기상계열을 36가지로 조합한 결과, 입력자료에 lag-time이 적용된 강우계열의 조합 분석에서 세가지 Regression모델을 제외한 모든 모형에서 NSE(Nash-Sutcliffe Efficiency) 0.8이상(최대 0.867)의 성능을 보였으며, lag-time이 적용된 강우와 유입량계열을 조합했을 경우 NSE 0.85이상(최대 0.901)의 더 우수한 성능을 보였다.

  • PDF

Calibration of Portable Particulate Mattere-Monitoring Device using Web Query and Machine Learning

  • Loh, Byoung Gook;Choi, Gi Heung
    • Safety and Health at Work
    • /
    • v.10 no.4
    • /
    • pp.452-460
    • /
    • 2019
  • Background: Monitoring and control of PM2.5 are being recognized as key to address health issues attributed to PM2.5. Availability of low-cost PM2.5 sensors made it possible to introduce a number of portable PM2.5 monitors based on light scattering to the consumer market at an affordable price. Accuracy of light scatteringe-based PM2.5 monitors significantly depends on the method of calibration. Static calibration curve is used as the most popular calibration method for low-cost PM2.5 sensors particularly because of ease of application. Drawback in this approach is, however, the lack of accuracy. Methods: This study discussed the calibration of a low-cost PM2.5-monitoring device (PMD) to improve the accuracy and reliability for practical use. The proposed method is based on construction of the PM2.5 sensor network using Message Queuing Telemetry Transport (MQTT) protocol and web query of reference measurement data available at government-authorized PM monitoring station (GAMS) in the republic of Korea. Four machine learning (ML) algorithms such as support vector machine, k-nearest neighbors, random forest, and extreme gradient boosting were used as regression models to calibrate the PMD measurements of PM2.5. Performance of each ML algorithm was evaluated using stratified K-fold cross-validation, and a linear regression model was used as a reference. Results: Based on the performance of ML algorithms used, regression of the output of the PMD to PM2.5 concentrations data available from the GAMS through web query was effective. The extreme gradient boosting algorithm showed the best performance with a mean coefficient of determination (R2) of 0.78 and standard error of 5.0 ㎍/㎥, corresponding to 8% increase in R2 and 12% decrease in root mean square error in comparison with the linear regression model. Minimum 100 hours of calibration period was found required to calibrate the PMD to its full capacity. Calibration method proposed poses a limitation on the location of the PMD being in the vicinity of the GAMS. As the number of the PMD participating in the sensor network increases, however, calibrated PMDs can be used as reference devices to nearby PMDs that require calibration, forming a calibration chain through MQTT protocol. Conclusions: Calibration of a low-cost PMD, which is based on construction of PM2.5 sensor network using MQTT protocol and web query of reference measurement data available at a GAMS, significantly improves the accuracy and reliability of a PMD, thereby making practical use of the low-cost PMD possible.

Academic Registration Text Classification Using Machine Learning

  • Alhawas, Mohammed S;Almurayziq, Tariq S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.93-96
    • /
    • 2022
  • Natural language processing (NLP) is utilized to understand a natural text. Text analysis systems use natural language algorithms to find the meaning of large amounts of text. Text classification represents a basic task of NLP with a wide range of applications such as topic labeling, sentiment analysis, spam detection, and intent detection. The algorithm can transform user's unstructured thoughts into more structured data. In this work, a text classifier has been developed that uses academic admission and registration texts as input, analyzes its content, and then automatically assigns relevant tags such as admission, graduate school, and registration. In this work, the well-known algorithms support vector machine SVM and K-nearest neighbor (kNN) algorithms are used to develop the above-mentioned classifier. The obtained results showed that the SVM classifier outperformed the kNN classifier with an overall accuracy of 98.9%. in addition, the mean absolute error of SVM was 0.0064 while it was 0.0098 for kNN classifier. Based on the obtained results, the SVM is used to implement the academic text classification in this work.

Exploring Simultaneous Presentation in Online Restaurant Reviews: An Analysis of Textual and Visual Content

  • Lin Li;Gang Ren;Taeho Hong;Sung-Byung Yang
    • Asia pacific journal of information systems
    • /
    • v.29 no.2
    • /
    • pp.181-202
    • /
    • 2019
  • The purpose of this study is to explore the effect of different types of simultaneous presentation (i.e., reviewer information, textual and visual content, and similarity between textual-visual contents) on review usefulness and review enjoyment in online restaurant reviews (ORRs), as they are interrelated yet have rarely been examined together in previous research. By using Latent Dirichlet Allocation (LDA) topic modeling and state-of-the-art machine learning (ML) methodologies, we found that review readability in textual content and salient objects in images in visual content have a significant impact on both review usefulness and review enjoyment. Moreover, similarity between textual-visual contents was found to be a major factor in determining review usefulness but not review enjoyment. As for reviewer information, reputation, expertise, and location of residence, these were found to be significantly related to review enjoyment. This study contributes to the body of knowledge on ORRs and provides valuable implications for general users and managers in the hospitality and tourism industries.

Comparative analysis of deep learning performance for Python and C# using Keras (Keras를 이용한 Python과 C#의 딥러닝 성능 비교 분석)

  • Lee, Sung-jin;Moon, Sang-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.360-363
    • /
    • 2022
  • According to the 2018 Kaggle ML & DS Survey, among the proportions of frameworks for machine learning and data science, TensorFlow and Keras each account for 41.82%. It was found to be 34.09%, and in the case of development programming, it is confirmed that about 82% use Python. A significant number of machine learning and deep learning structures utilize the Keras framework and Python, but in the case of Python, distribution and execution are limited to the Python script environment due to the script language, so it is judged that it is difficult to operate in various environments. This paper implemented a machine learning and deep learning system using C# and Keras running in Visual Studio 2019. Using the Mnist dataset, 100 tests were performed in Python 3.8,2 and C# .NET 5.0 environments, and the minimum time for Python was 1.86 seconds, the maximum time was 2.38 seconds, and the average time was 1.98 seconds. Time 1.78 seconds, maximum time 2.11 seconds, average time 1.85 seconds, total time 37.02 seconds. As a result of the experiment, the performance of C# improved by about 6% compared to Python, and it is expected that the utilization will be high because executable files can be extracted.

  • PDF

Effective teaching using textbooks and AI web apps (교과서와 AI 웹앱을 활용한 효과적인 교육방식)

  • Sobirjon, Habibullaev;Yakhyo, Mamasoliev;Kim, Ki-Hawn
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.211-213
    • /
    • 2022
  • Images in the textbooks influence the learning process. Students often see pictures before reading the text and these pictures can enhance the power of imagination of the students. The findings of some researches show that the images in textbooks can increase students' creativity. However, when learning major subjects, reading a textbook or looking at a picture alone may not be enough to understand the topics and completely realize the concepts. Studies show that viewers remember 95% of a message when watching a video than reading a text. If we can combine textbooks and videos, this teaching method is fantastic. The "TEXT + IMAGE + VIDEO (Animation)" concept could be more beneficial than ordinary ones. We tried to give our solution by using machine learning Image Classification. This paper covers the features, approaches and detailed objectives of our project. For now, we have developed the prototype of this project as a web app and it only works when accessed via smartphone. Once you have accessed the web app through your smartphone, the web app asks for access to use the camera. Suppose you bring your smartphone's camera closer to the picture in the textbook. It will then display the video related to the photo below.

  • PDF