• Title/Summary/Keyword: 웹 로그분석

Search Result 268, Processing Time 0.023 seconds

Greedy Query Optimization Performance Analysis for Join Continuous Query over Data Streams (데이터 스트림 환경에서의 조인 연속 질의의 그리디 질의 최적화 성능 분석)

  • Park, Hong-Kyu;Lee, Won-Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.361-364
    • /
    • 2006
  • 최근에 제한된 데이터 셋보다 센서 데이터 처리, 웹 서버 로그나 전화 기록과 같은 다양한 트랜잭션 로그 분석 등과 관련된 데이터 스트림 처리에 더 많은 관심이 집중되고 있으며, 특히 데이터 스트림의 질의 처리에 대한 관심이 증가하고 있다. 본 논문에서는 질의 중에서 2 개 이상의 스트림을 조인하는 조인 연속 질의를 처리하는 방법과 성능에 대해서 연구한다. 각 조인의 비용을 스트림의 입력 속도와 조인 선택도를 이용한 조인 비용 모델로 정의하고 그리디 알고리즘을 이용하여 최적화하는 기법을 제안하고 실험을 통해 다양한 스트림 환경에서 최적화 알고리즘이 어떤 성능을 보이는 지를 알아본다.

  • PDF

A Dynamic Recommendation System Using User Log Analysis and Document Similarity in Clusters (사용자 로그 분석과 클러스터 내의 문서 유사도를 이용한 동적 추천 시스템)

  • 김진수;김태용;최준혁;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.586-594
    • /
    • 2004
  • Because web documents become creation and disappearance rapidly, users require the recommend system that offers users to browse the web document conveniently and correctly. One largely untapped source of knowledge about large data collections is contained in the cumulative experiences of individuals finding useful information in the collection. Recommendation systems attempt to extract such useful information by capturing and mining one or more measures of the usefulness of the data. The existing Information Filtering system has the shortcoming that it must have user's profile. And Collaborative Filtering system has the shortcoming that users have to rate each web document first and in high-quantity, low-quality environments, users may cover only a tiny percentage of documents available. And dynamic recommendation system using the user browsing pattern also provides users with unrelated web documents. This paper classifies these web documents using the similarity between the web documents under the web document type and extracts the user browsing sequential pattern DB using the users' session information based on the web server log file. When user approaches the web document, the proposed Dynamic recommendation system recommends Top N-associated web documents set that has high similarity between current web document and other web documents and recommends set that has sequential specificity using the extracted informations and users' session information.

Dynamic Linking System Using Related Web Documents Classification and Users' Browsing Patterns (연관 웹 문서 분류와 사용자 브라우징 패턴을 이용한 동적 링킹 시스템)

  • Park, Young-Kyu;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10a
    • /
    • pp.305-308
    • /
    • 2000
  • 웹사이트 설계자의 주관적 판단에 의한 정적 하이퍼텍스트 링킹은 모든 사용자들에게 동일한 링크를 제공한다는 단점을 가지고 있다. 이러한 문제점을 개선하고, 각 사용자들의 브라우징 패턴에 적합한 웹 문서들을 동적 링크로 제공해주기 위한 여러 동적 링킹 시스템들이 제안되었다. 그러나 대부분의 동적 링킹 시스템들은 사용자의 현재 브라우징 패턴과 가장 유사한 패턴 정보만을 이용해 동적 링크를 제공하기 때문에 연관성이 없는 웹 문서들에 대한 링크를 수시로 제공한다는 또 다른 문제를 지니고 있다. 본 논문에서는 데이터 마이닝의 한 응용 분야인 웹 마이닝 기법을 이용하여 웹 서버의 로그파일로부터 사용자들의 브라우징 패턴을 분석해내고, 다차원 데이터 집합에 적합한 Association Rule Hypergraph Partitioning(ARHP) 알고리즘을 이용하여 서로 연관성이 있는 웹 문서들을 분류한다. 사용자 브라우징 패턴 정보로부터 사용자에게 추천해줄 1차 링크 집합을 생성하고, 연관 웹 문서 정보를 이용하여 2차 링크 집합을 생성한다. 그리고 두 링크 집합에 공통으로 포함된 링크 집합만을 사용자에게 동적으로 추천해줌으로써 사용자가 보다 편리하고 정확하게 웹사이트를 브라우징 할 수 있도록 하는 동적 링킹 시스템을 제안한다.

  • PDF

Study on Recovery Techniques for the Deleted or Damaged Event Log(EVTX) Files (삭제되거나 손상된 이벤트 로그(EVTX) 파일 복구 기술에 대한 연구)

  • Shin, Yonghak;Cheon, Junyoung;Kim, Jongsung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.2
    • /
    • pp.387-396
    • /
    • 2016
  • As the number of people using digital devices has increased, the digital forensic, which aims at finding clues for crimes in digital data, has been developed and become more important especially in court. Together with the development of the digital forensic, the anti-forensic which aims at thwarting the digital forensic has also been developed. As an example, with anti-forensic technology the criminal would delete an digital evidence without which the investigator would be hard to find any clue for crimes. In such a case, recovery techniques on deleted or damaged information will be very important in the field of digital forensic. Until now, even though EVTX(event log)-based recovery techniques on deleted files have been presented, but there has been no study to retrieve event log data itself, In this paper, we propose some recovery algorithms on deleted or damaged event log file and show that our recovery algorithms have high success rate through experiments.

Dynamic Resource Reallocation using User Connection Pattern per Timeslot (시구간별 사용자 접속 패턴을 이용한 동적 자원 재분배)

  • 이진성;최창열;박기진;김성수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04d
    • /
    • pp.572-574
    • /
    • 2003
  • 웹 서버 클러스터의 성능 개선을 위한 연구가 다양한 분야에서 이루어졌지만 로그 파일 분석과 같은 방식으로 접속 빈도를 통한 실시간 동적 자원 재분배에 관한 연구에만 대부분 초점을 맞추었다. 본 논문에서는 시구간별 접속 패턴 분석 결과를 기반으로 패턴을 예측하여 자원을 동적으로 재분배하는 메커니즘을 제안한다. 제안한 메커니즘은 불필요한 자원 낭비를 감소시켜 효율적인 자원 재분배를 통해 클러스터의 성능을 향상시킨다. 또한 시구간별 접속 패턴의 유사성을 증명한다.

  • PDF

A Study on QoS-Aware Cluster System Architecture (QoS 인지 클러스터 시스템 구조에 관한 연구)

  • 최창열;김성수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04d
    • /
    • pp.635-637
    • /
    • 2003
  • 복잡한 소프트웨어가 필요한 응용분야에 기존에 보유하고 있는 컴퓨팅 자원을 사용하여 구축될 수 있는 클러스터 시스템을 적용할 수 있다. 하지만 제공되는 서비스의 질을 예측하여 가용성과 성능 요구사항을 만족하기 위한 QoS 인지 클러스터 시스템 구조에 관한 연구가 필요하다. 따라서 본 논문에서는 차별화 된 서비스 제공과 의존도 요구사항을 만족하기 위해 다양한 메커니즘을 장착한 클러스터 시스템 구조를 도출하고 실제 상용 서비스를 제공하는 웹 서버의 로그 분석을 통한 작업 부하 모델링을 수행하여 얻은 시스템 운영 조건에 대한 성능 분석 실험을 수행하였다.

  • PDF

An Integrated Data Mining Model for Customer Relationship Management (고객관계관리를 위한 데이터마이닝 통합모형에 관한 연구)

  • Song, Im-Young;Oh, R.D.;Yi, T.S.;Shin, K.J.;Kim, K.C.
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.154-159
    • /
    • 2006
  • 본 논문은 웹 서버에 의해 자동으로 수집되는 로그 파일로부터 고객 가치 판단 기준을 고객의 행동 기반에 두고 군집화 기법을 이용하여 고객을 세분화하고 세분화 결과에 의사결정나무를 적용함으로써 고객을 분류하는 통합 모형을 제안하였다. 또한, 분류된 고객들의 주 서비스 활용 패턴을 분석하기 위하여 연관규칙기법을 적용하여 고객의 과학기술정보 활용의 연관성을 분석함으로써, 과학정보포털 서비스를 제공하는 사이트 이용자의 분류군에 해당하는 정보와 인터페이스를 제공하는 새로운 방법에 대하여 연구하였다. 고객 관리 측면에서 본 논문은 정보 서비스를 제공하는 웹 사이트의 기존고객을 분류하여 패턴을 분석함으로써 고객 위주의 사이트 운영정책과 동적 인터페이스를 제공하기 위한 웹사이트 활용 방안을 제시하였다. 또한, 고객의 지속적인 관리라 각 고객 분류군별에 안는 서비스를 제공하고 고객의 관리에도 기여할 수 있을 것이다.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

A Study on the Service Status of the Spatial Open Platform based on the Analysis of Web Server User Log: 2014.5.20.~2014.6.2. Log Data (웹 사용자 로그 분석 기반 공간정보 오픈플랫폼 서비스 사용현황 연구: 2014.5.20.~2014.6.2. 수집자료 대상)

  • Lee, Seung Han;Cho, Tae Hyun;Kim, Min Soo
    • Spatial Information Research
    • /
    • v.22 no.4
    • /
    • pp.67-76
    • /
    • 2014
  • Recently, through the development of IT and mobile technology, spatial information plays a role of infrastructure of the people life and the national economy. Many kinds of applications including SNS and social commerce is to leverage the spatial information for their services. In the case of domestic, spatial open platform that can provide national spatial data infrastructure services in a stable manner has been released. And many people have been interested to the open platform services. However, the open platform currently has many difficulties to analyze its service status and load in real time, because it does not hold a real-time monitoring system. Therefore, we propose a method that can analyze the real-time service status of the open platform using the analysis of the web server log information. In particular, we propose the results of the analysis as follows: amount of data transferred, network bandwidth, number of visitors, hit count, contents usage, and connection path. We think the results presented in this study is insufficient to understand the perfect service status of the open platform. However, it is expected to be utilized as the basic data for understanding of the service status and for system expansion of the open platform, every year.

Analysis of Users' Inflow Route and Search Terms of the Korea National Archives' Web Site (국가기록원 웹사이트 유입경로와 이용자 검색어 분석)

  • Jin, Ju Yeong;Rieh, Hae-young
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • As the users' information use environment changes to the Web, the archives are providing more services on the Web than before. This study analyzes the users' recent inflow route and the highly ranked 100 search terms of each month for 10 and half years in the Web site of National Archives of Korea, and suggests suitable information services. As a result of the analysis, it was found out that the inflow route could be divided into access from portal site, by country, from related institutions, and via mobile platform. As a result of analyzing the search terms of users for the last 10 and half years, the most frequently searched term turned out to be 'Land Survey Register', which was also the search term that was searched for with steady interests for 10 and half years. Also, other government documents or official gazettes were of great interests to users. As results of identifying the most frequently searched and steadily searched terms, we were able to categorize the search terms largely in terms of land, Japanese colonial period, the Korean war and relationship of North Korea and South Korea, and records management and use. Based on the results of the analysis, we suggested strengthening connection of the National Archives Web site with portal sites and mobile, and upgrading and improving search services of the National Archives. This study confirmed that the analysis of Web log and user search terms would yield meaningful results that could enhance the user services in archives.