• Title/Summary/Keyword: DataLog

Search Result 2,134, Processing Time 0.027 seconds

A Clustering Algorithm Considering Structural Relationships of Web Contents

  • Kang Hyuncheol;Han Sang-Tae;Sun Young-Su
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.191-197
    • /
    • 2005
  • Application of data mining techniques to the world wide web, referred to as web mining, has been the focus of several recent researches. With the explosive growth of information sources available on the world wide web, it has become increasingly necessary to track and analyze their usage patterns. In this study, we introduce a process of pre-processing and cluster analysis on web log data and suggest a distance measure considering the structural relationships between web contents. Also, we illustrate some real examples of cluster analysis for web log data and look into practical application of web usage mining for eCRM.

Estimation of Log-Odds Ratios for Incomplete $2{\times}2$ Tables with Covariates using FEFI

  • Kang, Shin-Soo;Bae, Je-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.1
    • /
    • pp.185-194
    • /
    • 2007
  • The information of covariates are available to do fully efficient fractional imputation(FEFI). The new method, FEFI with logistic regression is proposed to construct complete contingency tables. Jackknife method is used to get a standard errors of log-odds ratio from the completed table by the new method. Simulation results, when covariates have more information about categorical variables, reveal that the new method provides more efficient estimates of log-odds ratio than either multiple imputation(MI) based on data augmentation or complete case analysis.

  • PDF

Default Bayesian one sided testing for the shape parameter in the log-logistic distribution

  • Kang, Sang Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1583-1592
    • /
    • 2015
  • This paper deals with the problem of testing on the shape parameter in the log-logistic distribution. We propose default Bayesian testing procedures for the shape parameter under the reference priors. The reference prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. We can solve the this problem by the intrinsic Bayes factor and the fractional Bayes factor. Therefore we propose the default Bayesian testing procedures based on the fractional Bayes factor and the intrinsic Bayes factors under the reference priors. Simulation study and an example are provided.

Default Bayesian testing for scale parameters in the log-logistic distributions

  • Kang, Sang Gil;Kim, Dal Ho;Lee, Woo Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1501-1511
    • /
    • 2015
  • This paper deals with the problem of testing on the equality of the scale parameters in the log-logistic distributions. We propose default Bayesian testing procedures for the scale parameters under the reference priors. The reference prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. Therefore, we propose the default Bayesian testing procedures based on the fractional Bayes factor and the intrinsic Bayes factor under the reference priors. To justify proposed procedures, a simulation study is provided and also, an example is given.

An Optimal Parallel Sort Algorithm for Minimum Data Movement (최소 자료 이동을 위한 최적 병렬 정렬 알고리즘)

  • Hong, Seong-Su;Sim, Jae-Hong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.1 no.3
    • /
    • pp.290-298
    • /
    • 1994
  • In this paper we propose parallel sorting algorithm, taking 0( $n^{n}$ log n) time complexity, 0( $n^{x}$ log n) cost (parallel running time * number of processors) and 0( $n^{1-}$x+ $n^{x}$ )data movement complexity under the ERWW- PRAM model. The methods for solving these problems similar. Parallel algorithm finds pivot for partitioning the data into ordered subsets of approximately equal size by using encording pointers..

  • PDF

User Information Needs Analysis based on Query Log Big Data of the National Archives of Korea (국가기록원 질의로그 빅데이터 기반 이용자 정보요구 유형 분석)

  • Baek, Ji-yeon;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.183-205
    • /
    • 2019
  • Among the various methods for identifying users's information needs, Log analysis methods can realistically reflect the users' actual search behavior and analyze the overall usage of most users. Based on the large quantity of query log big data obtained through the portal service of the National Archives of Korea, this study conducted an analysis by the information type and search result type in order to identify the users' information needs. The Query log used in analysis were based on 1,571,547 query data collected over a total of 141 months from 2007 to December 2018, when the National Archives of Korea provided search services via the web. Furthermore, based on the analysis results, improvement methods were proposed to improve user search satisfaction. The results of this study could actually be used to improve and upgrade the National Archives of Korea search service.

An Associative Search System for Mobile Life-log Semantic Networks based on Visualization (시각화 기반 모바일 라이프 로그 시맨틱 네트워크 연관 검색 시스템)

  • Oh, Keun-Hyun;Kim, Yong-Jun;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.727-731
    • /
    • 2010
  • Recently, mobile life-log data are collected by mobile devices and used to recode one's life. In order to help a user search data, a mobile life-log semantic network is introduced for storing logs and retrieving associative information. However, associative search systems on common semantic networks in previous studies provide for a user with only found data as text to users. This paper proposes an associative search system for mobile life-log semantic network that supports selection and keyword associative search of which a process and result are a visualized graph representing associative data and their relationships when a user inputs a keyword for search. In addition, by using semantic abstraction, the system improves user's understanding of search result and simplifies the resulting graph. The system's usability was tested by an experiment comparing the system and a text-based search system.

Mobile Gamer Categorization with Archetypal Analysis and Cognitive-Psychological Features from Log Data (로그 데이터의 유형분석 및 인지심리적 속성 추출을 이용한 모바일 게이머 유형화 연구)

  • Jeon, Jihoon;Yoon, Dumim;Yang, Seongil;Kim, Kyungjoong
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.234-241
    • /
    • 2018
  • The study of classifying gamer types or analyzing the characteristics of gamers is a field of interest for data analysis researchers. From the past to the present, much research has been done on gamer categorization and gamer analysis. However, most studies use surveys or bio-signals, which is not practical because it is difficult to obtain large amounts of data. Even if the game log is used, it is difficult to analyze the psychology of the gamer because the gamer is categorized and analyzed by extracting only statistical values. However, if we can extract the cognitive psychology information of the gamer from the basic game log, we can analyze the gamer more intuitively and easily. In this paper, we extracted eight cognitive psychological features representing the behavior and psychological information of the gamer using Crazy Dragon's game log, which is a mobile Role-Playing-Game (RPG). In addition, we classified gamers based upon cognitive psychological features and analyzed them using eight cognitive psychological features. As a result, most gamers were highly correlated with one or two types.

Pre-Processing of Query Logs in Web Usage Mining

  • Abdullah, Norhaiza Ya;Husin, Husna Sarirah;Ramadhani, Herny;Nadarajan, Shanmuga Vivekanada
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.82-86
    • /
    • 2012
  • In For the past few years, query log data has been collected to find user's behavior in using the site. Many researches have studied on the usage of query logs to extract user's preference, recommend personalization, improve caching and pre-fetching of Web objects, build better adaptive user interfaces, and also to improve Web search for a search engine application. A query log contain data such as the client's IP address, time and date of request, the resources or page requested, status of request HTTP method used and the type of browser and operating system. A query log can offer valuable insight into web site usage. A proper compilation and interpretation of query log can provide a baseline of statistics that indicate the usage levels of website and can be used as tool to assist decision making in management activities. In this paper we want to discuss on the tasks performed of query logs in pre-processing of web usage mining. We will use query logs from an online newspaper company. The query logs will undergo pre-processing stage, in which the clickstream data is cleaned and partitioned into a set of user interactions which will represent the activities of each user during their visits to the site. The query logs will undergo essential task in pre-processing which are data cleaning and user identification.

Diagnosis Analysis of Patient Process Log Data (환자의 프로세스 로그 정보를 이용한 진단 분석)

  • Bae, Joonsoo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.126-134
    • /
    • 2019
  • Nowadays, since there are so many big data available everywhere, those big data can be used to find useful information to improve design and operation by using various analysis methods such as data mining. Especially if we have event log data that has execution history data of an organization such as case_id, event_time, event (activity), performer, etc., then we can apply process mining to discover the main process model in the organization. Once we can find the main process from process mining, we can utilize it to improve current working environment. In this paper we developed a new method to find a final diagnosis of a patient, who needs several procedures (medical test and examination) to diagnose disease of the patient by using process mining approach. Some patients can be diagnosed by only one procedure, but there are certainly some patients who are very difficult to diagnose and need to take several procedures to find exact disease name. We used 2 million procedure log data and there are 397 thousands patients who took 2 and more procedures to find a final disease. These multi-procedure patients are not frequent case, but it is very critical to prevent wrong diagnosis. From those multi-procedure taken patients, 4 procedures were discovered to be a main process model in the hospital. Using this main process model, we can understand the sequence of procedures in the hospital and furthermore the relationship between diagnosis and corresponding procedures.