• Title/Summary/Keyword: Log data

Search Result 2,120, Processing Time 0.03 seconds

Consumer behavior prediction using Airbnb web log data (에어비앤비(Airbnb) 웹 로그 데이터를 이용한 고객 행동 예측)

  • An, Hyoin;Choi, Yuri;Oh, Raeeun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.391-404
    • /
    • 2019
  • Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.

Auto Configuration Module for Logstash in Elasticsearch Ecosystem

  • Ahmed, Hammad;Park, Yoosang;Choi, Jongsun;Choi, Jaeyoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.10a
    • /
    • pp.39-42
    • /
    • 2018
  • Log analysis and monitoring have a significant importance in most of the systems. Log management has core importance in applications like distributed applications, cloud based applications, and applications designed for big data. These applications produce a large number of log files which contain essential information. This information can be used for log analytics to understand the relevant patterns from varying log data. However, they need some tools for the purpose of parsing, storing, and visualizing log informations. "Elasticsearch, Logstash, and Kibana"(ELK Stack) is one of the most popular analyzing tools for log management. For the ingestion of log files configuration files have a key importance, as they cover all the services needed to input, process, and output the log files. However, creating configuration files is sometimes very complicated and time consuming in many applications as it requires domain expertise and manual creation. In this paper, an auto configuration module for Logstash is proposed which aims to auto generate the configuration files for Logstash. The primary purpose of this paper is to provide a mechanism, which can be used to auto generate the configuration files for corresponding log files in less time. The proposed module aims to provide an overall efficiency in the log management system.

Design and Analysis of the Log Authentication Mechanism based on the Merkle Tree (Merkle Tree 기반의 로그인증 메커니즘 설계 및 분석)

  • Lee, Jung yeob;Park, Chang seop
    • Convergence Security Journal
    • /
    • v.17 no.1
    • /
    • pp.3-13
    • /
    • 2017
  • As security log plays important roles in various fields, the integrity of log data become more and more important. Especially, the stored log data is an immediate target of the intruder to erase his trace in the system penetrated. Several theoretical schemes to guarantee the forward secure integrity have been proposed, even though they cannot provide the integrity of the log data after the system is penetrated. Authentication tags of these methods are based on the linear-hash chain. In this case, it is difficult to run partial validation and to accelerate generating and validating authentication tags. In this paper, we propose a log authentication mechanism, based on Mekle Tree, which is easy to do partial validation and able to apply multi threading.

A Study on Process Management Method of Offshore Plant Piping Material using Process Mining Technique (프로세스 마이닝 기법을 이용한 해양플랜트 배관재 제작 공정 관리 방법에 관한 연구)

  • Park, JungGoo;Kim, MinGyu;Woo, JongHun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.2
    • /
    • pp.143-151
    • /
    • 2019
  • This study describes a method for analyzing log data generated in a process using process mining techniques. A system for collecting and analyzing a large amount of log data generated in the process of manufacturing an offshore plant piping material was constructed. The analyzed data was visualized through various methods. Through the analysis of the process model, it was evaluated whether the process performance was correctly input. Through the pattern analysis of the log data, it is possible to check beforehand whether the problem process occurred. In addition, we analyzed the process performance data of partner companies and identified the load of their processes. These data can be used as reference data for pipe production allocation. Real-time decision-making is required to cope with the various variances that arise in offshore plant production. To do this, we have built a system that can analyze the log data of real - time system and make decisions.

XML-based Windows Event Log Forensic tool design and implementation (XML기반 Windows Event Log Forensic 도구 설계 및 구현)

  • Kim, Jongmin;Lee, DongHwi
    • Convergence Security Journal
    • /
    • v.20 no.5
    • /
    • pp.27-32
    • /
    • 2020
  • The Windows Event Log is a Log that defines the overall behavior of the system, and these files contain data that can detect various user behaviors and signs of anomalies. However, since the Event Log is generated for each action, it takes a considerable amount of time to analyze the log. Therefore, in this study, we designed and implemented an XML-based Event Log analysis tool based on the main Event Log list of "Spotting the Adversary with Windows Event Log Monitoring" presented at the NSA.

Bayesian Analysis in Generalized Log-Gamma Censored Regression Model

  • Younshik chung;Yoomi Kang
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.3
    • /
    • pp.733-742
    • /
    • 1998
  • For industrial and medical lifetime data, the generalized log-gamma regression model is considered. Then the Bayesian analysis for the generalized log-gamma regression with censored data are explained and following the data augmentation (Tanner and Wang; 1987), the censored data is replaced by simulated data. To overcome the complicated Bayesian computation, Makov Chain Monte Carlo (MCMC) method is employed. Then some modified algorithms are proposed to implement MCMC. Finally, one example is presented.

  • PDF

The Method of Analyzing Firewall Log Data using MapReduce based on NoSQL (NoSQL기반의 MapReduce를 이용한 방화벽 로그 분석 기법)

  • Choi, Bomin;Kong, Jong-Hwan;Hong, Sung-Sam;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.4
    • /
    • pp.667-677
    • /
    • 2013
  • As the firewall is a typical network security equipment, it is usually installed at most of internal/external networks and makes many packet data in/out. So analyzing the its logs stored in it can provide important and fundamental data on the network security research. However, along with development of communications technology, the speed of internet network is improved and then the amount of log data is becoming 'Massive Data' or 'BigData'. In this trend, there are limits to analyze log data using the traditional database model RDBMS. In this paper, through our Method of Analyzing Firewall log data using MapReduce based on NoSQL, we have discovered that the introducing NoSQL data base model can more effectively analyze the massive log data than the traditional one. We have demonstrated execellent performance of the NoSQL by comparing the performance of data processing with existing RDBMS. Also the proposed method is evaluated by experiments that detect the three attack patterns and shown that it is highly effective.

Real time predictive analytic system design and implementation using Bigdata-log (빅데이터 로그를 이용한 실시간 예측분석시스템 설계 및 구현)

  • Lee, Sang-jun;Lee, Dong-hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.6
    • /
    • pp.1399-1410
    • /
    • 2015
  • Gartner is requiring companies to considerably change their survival paradigms insisting that companies need to understand and provide again the upcoming era of data competition. With the revealing of successful business cases through statistic algorithm-based predictive analytics, also, the conversion into preemptive countermeasure through predictive analysis from follow-up action through data analysis in the past is becoming a necessity of leading enterprises. This trend is influencing security analysis and log analysis and in reality, the cases regarding the application of the big data analysis framework to large-scale log analysis and intelligent and long-term security analysis are being reported file by file. But all the functions and techniques required for a big data log analysis system cannot be accommodated in a Hadoop-based big data platform, so independent platform-based big data log analysis products are still being provided to the market. This paper aims to suggest a framework, which is equipped with a real-time and non-real-time predictive analysis engine for these independent big data log analysis systems and can cope with cyber attack preemptively.

A Stability Verification of Backup System for Disaster Recovery (재해 복구를 위한 백업 시스템의 안정성 검증)

  • Lee, Moon-Goo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.205-214
    • /
    • 2012
  • The main thing that IT operation managers consider is protecting assets of corporation from system failure and disaster. Therefore, this research proposed a backup system for a disaster recovery. Previous backup method is that if database update occurs, this record is saved in redo log, and if the size of record file is over than expected, this file is saved in archive log in order. Thus, it is possible to occur errors of data loss from the process of data backup which change in real time while changes of database occur. Suggested backup system is back redo log up to database of transaction log in real time, and back a record that can be omitted from previous backup method up to archive log. When recover the data, it is possible to recover redo log in real time online, and it minimizes data loss. Also, throughout multi thread processing method data recovery is performed and it is designed that system performance is improved. To verify stability of backup system CPN(Coloured Petri Net) is introduced, and each step of backup system is displayed in diagram form, and th e stability is verified based on the definition and theorem of CPN.

Notes on the Ratio and the Right-Tail Probability in a Log-Laplace Distribution

  • Woo, Jung-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1171-1177
    • /
    • 2007
  • We consider estimation of the right-tail probability in a log-Laplace random variable, As we derive the density of ratio of two independent log-Laplace random variables, the k-th moment of the ratio is represented by a special mathematical function. and hence variance of the ratio can be represented by a psi-function.

  • PDF