• Title/Summary/Keyword: 파싱

Search Result 385, Processing Time 0.029 seconds

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

Improved Method for Learning Context-Free Grammar using Tabular representation

  • Jung, Soon-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.43-51
    • /
    • 2022
  • In this paper, we suggest the method to improve the existing method leaning context-free grammar(CFG) using tabular representation(TBL) as a chromosome of genetic algorithm in grammatical inference and show the more efficient experimental result. We have two improvements. The first is to improve the formula to reflect the learning evaluation of positive and negative examples at the same time for the fitness function. The second is to classify partitions corresponding to TBLs generated from positive learning examples according to the size of the learning string, proceed with the evolution process by class, and adjust the composition ratio according to the success rate to apply the learning method linked to survival in the next generation. These improvements provide better efficiency than the existing method by solving the complexity and difficulty in the crossover and generalization steps between several individuals according to the size of the learning examples. We experiment with the languages proposed in the existing method, and the results show a rather fast generation rate that takes fewer generations to complete learning with the same success rate than the existing method. In the future, this method can be tried for extended CYK, and furthermore, it suggests the possibility of being applied to more complex parsing tables.

Implementation of Security Information and Event Management for Realtime Anomaly Detection and Visualization (실시간 이상 행위 탐지 및 시각화 작업을 위한 보안 정보 관리 시스템 구현)

  • Kim, Nam Gyun;Park, Sang Seon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.5
    • /
    • pp.303-314
    • /
    • 2018
  • In the past few years, government agencies and corporations have succumbed to stealthy, tailored cyberattacks designed to exploit vulnerabilities, disrupt operations and steal valuable information. Security Information and Event Management (SIEM) is useful tool for cyberattacks. SIEM solutions are available in the market but they are too expensive and difficult to use. Then we implemented basic SIEM functions to research and development for future security solutions. We focus on collection, aggregation and analysis of real-time logs from host. This tool allows parsing and search of log data for forensics. Beyond just log management it uses intrusion detection and prioritize of security events inform and support alerting to user. We select Elastic Stack to process and visualization of these security informations. Elastic Stack is a very useful tool for finding information from large data, identifying correlations and creating rich visualizations for monitoring. We suggested using vulnerability check results on our SIEM. We have attacked to the host and got real time user activity for monitoring, alerting and security auditing based this security information management.

Design and Implementation of Medical Information System using QR Code (QR 코드를 이용한 의료정보 시스템 설계 및 구현)

  • Lee, Sung-Gwon;Jeong, Chang-Won;Joo, Su-Chong
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.109-115
    • /
    • 2015
  • The new medical device technologies for bio-signal information and medical information which developed in various forms have been increasing. Information gathering techniques and the increasing of the bio-signal information device are being used as the main information of the medical service in everyday life. Hence, there is increasing in utilization of the various bio-signals, but it has a problem that does not account for security reasons. Furthermore, the medical image information and bio-signal of the patient in medical field is generated by the individual device, that make the situation cannot be managed and integrated. In order to solve that problem, in this paper we integrated the QR code signal associated with the medial image information including the finding of the doctor and the bio-signal information. bio-signal. System implementation environment for medical imaging devices and bio-signal acquisition was configured through bio-signal measurement, smart device and PC. For the ROI extraction of bio-signal and the receiving of image information that transfer from the medical equipment or bio-signal measurement, .NET Framework was used to operate the QR server module on Window Server 2008 operating system. The main function of the QR server module is to parse the DICOM file generated from the medical imaging device and extract the identified ROI information to store and manage in the database. Additionally, EMR, patient health information such as OCS, extracted ROI information needed for basic information and emergency situation is managed by QR code. QR code and ROI management and the bio-signal information file also store and manage depending on the size of receiving the bio-singnal information case with a PID (patient identification) to be used by the bio-signal device. If the receiving of information is not less than the maximum size to be converted into a QR code, the QR code and the URL information can access the bio-signal information through the server. Likewise, .Net Framework is installed to provide the information in the form of the QR code, so the client can check and find the relevant information through PC and android-based smart device. Finally, the existing medical imaging information, bio-signal information and the health information of the patient are integrated over the result of executing the application service in order to provide a medical information service which is suitable in medical field.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.