• Title/Summary/Keyword: Log parsing

Search Result 7, Processing Time 0.021 seconds

Accurate and Efficient Log Template Discovery Technique

  • Tak, Byungchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.11-21
    • /
    • 2018
  • In this paper we propose a novel log template discovery algorithm which achieves high quality of discovered log templates through iterative log filtering technique. Log templates are the static string pattern of logs that are used to produce actual logs by inserting variable values during runtime. Identifying individual logs into their template category correctly enables us to conduct automated analysis using state-of-the-art machine learning techniques. Our technique looks at the group of logs column-wise and filters the logs that have the value of the highest proportion. We repeat this process per each column until we are left with highly homogeneous set of logs that most likely belong to the same log template category. Then, we determine which column is the static part and which is the variable part by vertically comparing all the logs in the group. This process repeats until we have discovered all the templates from given logs. Also, during this process we discover the custom patterns such as ID formats that are unique to the application. This information helps us quickly identify such strings in the logs as variable parts thereby further increasing the accuracy of the discovered log templates. Existing solutions suffer from log templates being too general or too specific because of the inability to detect custom patterns. Through extensive evaluations we have learned that our proposed method achieves 2 to 20 times better accuracy.

Auto Configuration Module for Logstash in Elasticsearch Ecosystem

  • Ahmed, Hammad;Park, Yoosang;Choi, Jongsun;Choi, Jaeyoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.39-42
    • /
    • 2018
  • Log analysis and monitoring have a significant importance in most of the systems. Log management has core importance in applications like distributed applications, cloud based applications, and applications designed for big data. These applications produce a large number of log files which contain essential information. This information can be used for log analytics to understand the relevant patterns from varying log data. However, they need some tools for the purpose of parsing, storing, and visualizing log informations. "Elasticsearch, Logstash, and Kibana"(ELK Stack) is one of the most popular analyzing tools for log management. For the ingestion of log files configuration files have a key importance, as they cover all the services needed to input, process, and output the log files. However, creating configuration files is sometimes very complicated and time consuming in many applications as it requires domain expertise and manual creation. In this paper, an auto configuration module for Logstash is proposed which aims to auto generate the configuration files for Logstash. The primary purpose of this paper is to provide a mechanism, which can be used to auto generate the configuration files for corresponding log files in less time. The proposed module aims to provide an overall efficiency in the log management system.

High-Speed Search Mechanism based on B-Tree Index Vector for Huge Web Log Mining and Web Attack Detection (대용량 웹 로그 마이닝 및 공격탐지를 위한 B-트리 인덱스 벡터 기반 고속 검색 기법)

  • Lee, Hyung-Woo;Kim, Tae-Su
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.11
    • /
    • pp.1601-1614
    • /
    • 2008
  • The number of web service users has been increased rapidly as existing services are changed into the web-based internet applications. Therefore, it is necessary for us to use web log pre-processing technique to detect attacks on diverse web service transactions and it is also possible to extract web mining information. However, existing mechanisms did not provide efficient pre-processing procedures for a huge volume of web log data. In this paper, we proposed both a field based parsing and a high-speed log indexing mechanism based on the suggested B-tree Index Vector structure for performance enhancement. In experiments, the proposed mechanism provides an efficient web log pre-processing and search functions with a session classification. Therefore it is useful to enhance web attack detection function.

  • PDF

Design and Implementation of Advanced Web Log Preprocess Algorithm for Rule based Web IDS (룰 기반 웹 IDS 시스템을 위한 효율적인 웹 로그 전처리 기법 설계 및 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.9 no.5
    • /
    • pp.23-34
    • /
    • 2008
  • The number of web service user is increasing steadily as web-based service is offered in various form. But, web service has a vulnerability such as SQL Injection, Parameter Injection and DoS attack. Therefore, it is required for us to develop Web IDS system and additionally to offer Rule-base intrusion detection/response mechanism against those attacks. However, existing Web IDS system didn't correspond properly on recent web attack mechanism because they didn't including suitable pre-processing procedure on huge web log data. Therfore, we propose an efficient web log pre-processing mechanism for enhancing rule based detection and improving the performance of web IDS base attack response system. Proposed algorithm provides both a field unit parsing and a duplicated string elimination procedure on web log data. And it is also possible for us to construct improved web IDS system.

  • PDF

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

Implementation of Security Information and Event Management for Realtime Anomaly Detection and Visualization (실시간 이상 행위 탐지 및 시각화 작업을 위한 보안 정보 관리 시스템 구현)

  • Kim, Nam Gyun;Park, Sang Seon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.5
    • /
    • pp.303-314
    • /
    • 2018
  • In the past few years, government agencies and corporations have succumbed to stealthy, tailored cyberattacks designed to exploit vulnerabilities, disrupt operations and steal valuable information. Security Information and Event Management (SIEM) is useful tool for cyberattacks. SIEM solutions are available in the market but they are too expensive and difficult to use. Then we implemented basic SIEM functions to research and development for future security solutions. We focus on collection, aggregation and analysis of real-time logs from host. This tool allows parsing and search of log data for forensics. Beyond just log management it uses intrusion detection and prioritize of security events inform and support alerting to user. We select Elastic Stack to process and visualization of these security informations. Elastic Stack is a very useful tool for finding information from large data, identifying correlations and creating rich visualizations for monitoring. We suggested using vulnerability check results on our SIEM. We have attacked to the host and got real time user activity for monitoring, alerting and security auditing based this security information management.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.