• Title/Summary/Keyword: Anomaly data detection

Search Result 384, Processing Time 0.033 seconds

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

An Assessment of Applicability of Heat Waves Using Extreme Forecast Index in KMA Climate Prediction System (GloSea5) (기상청 현업 기후예측시스템(GloSea5)에서의 극한예측지수를 이용한 여름철 폭염 예측 성능 평가)

  • Heo, Sol-Ip;Hyun, Yu-Kyung;Ryu, Young;Kang, Hyun-Suk;Lim, Yoon-Jin;Kim, Yoonjae
    • Atmosphere
    • /
    • v.29 no.3
    • /
    • pp.257-267
    • /
    • 2019
  • This study is to assess the applicability of the Extreme Forecast Index (EFI) algorithm of the ECMWF seasonal forecast system to the Global Seasonal Forecasting System version 5 (GloSea5), operational seasonal forecast system of the Korea Meteorological Administration (KMA). The EFI is based on the difference between Cumulative Distribution Function (CDF) curves of the model's climate data and the current ensemble forecast distribution, which is essential to diagnose the predictability in the extreme cases. To investigate its applicability, the experiment was conducted during the heat-wave cases (the year of 1994 and 2003) and compared GloSea5 hindcast data based EFI with anomaly data of ERA-Interim. The data also used to determine quantitative estimates of Probability Of Detection (POD), False Alarm Ratio (FAR), and spatial pattern correlation. The results showed that the area of ERA-Interim indicating above 4-degree temperature corresponded to the area of EFI 0.8 and above. POD showed high ratio (0.7 and 0.9, respectively), when ERA-Interim anomaly data were the highest (on Jul. 11, 1994 (> $5^{\circ}C$) and Aug. 8, 2003 (> $7^{\circ}C$), respectively). The spatial pattern showed a high correlation in the range of 0.5~0.9. However, the correlation decreased as the lead time increased. Furthermore, the case of Korea heat wave in 2018 was conducted using GloSea5 forecast data to validate EFI showed successful prediction for two to three weeks lead time. As a result, the EFI forecasts can be used to predict the probability that an extreme weather event of interest might occur. Overall, we expected these results to be available for extreme weather forecasting.

Stateful Virtual Proxy Server for Attack Detection based on SIP Protocol State Monitoring Mechanism (SIP 프로토콜 상태정보 기반 공격 탐지 기능을 제공하는 가상 프록시 서버 설계 및 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.9 no.6
    • /
    • pp.37-48
    • /
    • 2008
  • VoIP service is a transmission of voice data using SIP protocol on IP based network, The SIP protocol has many advantages such as providing IP based voice communication and multimedia service with cheap communication cost and so on. Therefore the SIP protocol spread out very quickly. But, SIP protocol exposes new forms of vulnerabilities on malicious attacks such as Message Flooding attack and protocol parsing attack. And it also suffers threats from many existing vulnerabilities like on IP based protocol. In this paper, we propose a new Virtual Proxy Server system in front of the existed Proxy Server for anomaly detection of SIP attack and stateful management of SIP session with enhanced security. Based on stateful virtual proxy server, out solution shows promising SIP Message Flooding attack verification and detection performance with minimized latency on SIP packet transmission.

  • PDF

An Anomalous Event Detection System based on Information Theory (엔트로피 기반의 이상징후 탐지 시스템)

  • Han, Chan-Kyu;Choi, Hyoung-Kee
    • Journal of KIISE:Information Networking
    • /
    • v.36 no.3
    • /
    • pp.173-183
    • /
    • 2009
  • We present a real-time monitoring system for detecting anomalous network events using the entropy. The entropy accounts for the effects of disorder in the system. When an abnormal factor arises to agitate the current system the entropy must show an abrupt change. In this paper we deliberately model the Internet to measure the entropy. Packets flowing between these two networks may incur to sustain the current value. In the proposed system we keep track of the value of entropy in time to pinpoint the sudden changes in the value. The time-series data of entropy are transformed into the two-dimensional domains to help visually inspect the activities on the network. We examine the system using network traffic traces containing notorious worms and DoS attacks on the testbed. Furthermore, we compare our proposed system of time series forecasting method, such as EWMA, holt-winters, and PCA in terms of sensitive. The result suggests that our approach be able to detect anomalies with the fairly high accuracy. Our contributions are two folds: (1) highly sensitive detection of anomalies and (2) visualization of network activities to alert anomalies.

Noise-Robust Porcine Respiratory Diseases Classification Using Texture Analysis and CNN (질감 분석과 CNN을 이용한 잡음에 강인한 돼지 호흡기 질병 식별)

  • Choi, Yongju;Lee, Jonguk;Park, Daihee;Chung, Yongwha
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.91-98
    • /
    • 2018
  • Automatic detection of pig wasting diseases is an important issue in the management of group-housed pigs. In particular, porcine respiratory diseases are one of the main causes of mortality among pigs and loss of productivity in intensive pig farming. In this paper, we propose a noise-robust system for the early detection and recognition of pig wasting diseases using sound data. In this method, first we convert one-dimensional sound signals to two-dimensional gray-level images by normalization, and extract texture images by means of dominant neighborhood structure technique. Lastly, the texture features are then used as inputs of convolutional neural networks as an early anomaly detector and a respiratory disease classifier. Our experimental results show that this new method can be used to detect pig wasting diseases both economically (low-cost sound sensor) and accurately (over 96% accuracy) even under noise-environmental conditions, either as a standalone solution or to complement known methods to obtain a more accurate solution.

A Survey on Deep Learning-based Analysis for Education Data (빅데이터와 AI를 활용한 교육용 자료의 분석에 대한 조사)

  • Lho, Young-uhg
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.240-243
    • /
    • 2021
  • Recently, there have been research results of applying Big data and AI technologies to the evaluation and individual learning for education. It is information technology innovations that collect dynamic and complex data, including student personal records, physiological data, learning logs and activities, learning outcomes and outcomes from social media, MOOCs, intelligent tutoring systems, LMSs, sensors, and mobile devices. In addition, e-learning was generated a large amount of learning data in the COVID-19 environment. It is expected that learning analysis and AI technology will be applied to extract meaningful patterns and discover knowledge from this data. On the learner's perspective, it is necessary to identify student learning and emotional behavior patterns and profiles, improve evaluation and evaluation methods, predict individual student learning outcomes or dropout, and research on adaptive systems for personalized support. This study aims to contribute to research in the field of education by researching and classifying machine learning technologies used in anomaly detection and recommendation systems for educational data.

  • PDF

Two-Phase Approach for Data Quality Management for Slope Stability Monitoring (경사면의 안정성 모니터링 데이터의 품질관리를 위한 2 단계 접근방안)

  • Junhyuk Choi;Yongjin Kim;Junhwi Cho;Woocheol Jeong;Songhee Suk;Song Choi;Yongseong Kim;Bongjun Ji
    • Journal of the Korean Geosynthetics Society
    • /
    • v.22 no.1
    • /
    • pp.67-74
    • /
    • 2023
  • In order to monitor the stability of slopes, research on data-based slope failure prediction and early warning is increasing. However, most papers overlook the quality of data. Poor data quality can cause problems such as false alarms. Therefore, this paper proposes a two-step hybrid approach consisting of rules and machine learning models for quality control of data collected from slopes. The rule-based has the advantage of high accuracy and intuitive interpretation, and the machine learning model has the advantage of being able to derive patterns that cannot be explicitly expressed. The hybrid approach was able to take both of these advantages. Through a case study, the performance of using the two methods alone and the case of using the hybrid approach was compared, and the hybrid method was judged to have high performance. Therefore, it is judged that using a hybrid method is more appropriate than using the two methods alone for data quality control.

A Design of Time-based Anomaly Intrusion Detection Model (시간 기반의 비정상 행위 침입탐지 모델 설계)

  • Shin, Mi-Yea;Jeong, Yoon-Su;Lee, Sang-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.5
    • /
    • pp.1066-1072
    • /
    • 2011
  • In the method to analyze the relationship in the system call orders, the normal system call orders are divided into a certain size of system call orders to generates gene and use them as the detectors. In the method to consider the system call parameters, the mean and standard deviation of the parameter lengths are used as the detectors. The attack of which system call order is normal but the parameter values are changed, such as the format string attack, cannot be detected by the method that considers only the system call orders, whereas the model that considers only the system call parameters has the drawback of high positive defect rate because of the information obtained from the interval where the attack has not been initiated, since the parameters are considered individually. To solve these problems, it is necessary to develop a more efficient learning and detecting method that groups the continuous system call orders and parameters as the approach that considers various characteristics of system call related to attacking simultaneously. In this article, we detected the anomaly of the system call orders and parameters by applying the temporal concept to the system call orders and parameters in order to improve the rate of positive defect, that is, the misjudgment of anomaly as normality. The result of the experiment where the DARPA data set was employed showed that the proposed method improved the positive defect rate by 13% in the system call order model where time was considered in comparison with that of the model where time was not considered.

A Distributed Real-time Self-Diagnosis System for Processing Large Amounts of Log Data (대용량 로그 데이터 처리를 위한 분산 실시간 자가 진단 시스템)

  • Son, Siwoon;Kim, Dasol;Moon, Yang-Sae;Choi, Hyung-Jin
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.58-68
    • /
    • 2018
  • Distributed computing helps to efficiently store and process large data on a cluster of multiple machines. The performance of distributed computing is greatly influenced depending on the state of the servers constituting the distributed system. In this paper, we propose a self-diagnosis system that collects log data in a distributed system, detects anomalies and visualizes the results in real time. First, we divide the self-diagnosis process into five stages: collecting, delivering, analyzing, storing, and visualizing stages. Next, we design a real-time self-diagnosis system that meets the goals of real-time, scalability, and high availability. The proposed system is based on Apache Flume, Apache Kafka, and Apache Storm, which are representative real-time distributed techniques. In addition, we use simple but effective moving average and 3-sigma based anomaly detection technique to minimize the delay of log data processing during the self-diagnosis process. Through the results of this paper, we can construct a distributed real-time self-diagnosis solution that can diagnose server status in real time in a complicated distributed system.

Detection of Signs of Hostile Cyber Activity against External Networks based on Autoencoder (오토인코더 기반의 외부망 적대적 사이버 활동 징후 감지)

  • Park, Hansol;Kim, Kookjin;Jeong, Jaeyeong;Jang, jisu;Youn, Jaepil;Shin, Dongkyoo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.39-48
    • /
    • 2022
  • Cyberattacks around the world continue to increase, and their damage extends beyond government facilities and affects civilians. These issues emphasized the importance of developing a system that can identify and detect cyber anomalies early. As above, in order to effectively identify cyber anomalies, several studies have been conducted to learn BGP (Border Gateway Protocol) data through a machine learning model and identify them as anomalies. However, BGP data is unbalanced data in which abnormal data is less than normal data. This causes the model to have a learning biased result, reducing the reliability of the result. In addition, there is a limit in that security personnel cannot recognize the cyber situation as a typical result of machine learning in an actual cyber situation. Therefore, in this paper, we investigate BGP (Border Gateway Protocol) that keeps network records around the world and solve the problem of unbalanced data by using SMOTE. After that, assuming a cyber range situation, an autoencoder classifies cyber anomalies and visualizes the classified data. By learning the pattern of normal data, the performance of classifying abnormal data with 92.4% accuracy was derived, and the auxiliary index also showed 90% performance, ensuring reliability of the results. In addition, it is expected to be able to effectively defend against cyber attacks because it is possible to effectively recognize the situation by visualizing the congested cyber space.