• Title/Summary/Keyword: Log Clustering

Search Result 73, Processing Time 0.032 seconds

Functional clustering for electricity demand data: A case study (시간단위 전력수요자료의 함수적 군집분석: 사례연구)

  • Yoon, Sanghoo;Choi, Youngjean
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.885-894
    • /
    • 2015
  • It is necessary to forecast the electricity demand for reliable and effective operation of the power system. In this study, we try to categorize a functional data, the mean curve in accordance with the time of daily power demand pattern. The data were collected between January 1, 2009 and December 31, 2011. And it were converted to time series data consisting of seasonal components and error component through log transformation and removing trend. Functional clustering by Ma et al. (2006) are applied and parameters are estimated using EM algorithm and generalized cross validation. The number of clusters is determined by classifying holidays or weekdays. Monday, weekday (Tuesday to Friday), Saturday, Sunday or holiday and season are described the mean curve of daily power demand pattern.

Energy-Aware Self-Stabilizing Distributed Clustering Protocol for Ad Hoc Networks: the case of WSNs

  • Ba, Mandicou;Flauzac, Olivier;Haggar, Bachar Salim;Makhloufi, Rafik;Nolot, Florent;Niang, Ibrahima
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2577-2596
    • /
    • 2013
  • In this paper, we present an Energy-Aware Self-Stabilizing Distributed Clustering protocol based on message-passing model for Ad Hoc networks. The latter does not require any initialization. Starting from an arbitrary configuration, the network converges to a stable state in a finite time. Our contribution is twofold. We firstly give the formal proof that the stabilization is reached after at most n+2 transitions and requires at most $n{\times}log(2n+{\kappa}+3)$ memory space, where n is the number of network nodes and ${\kappa}$ represents the maximum hops number in the clusters. Furthermore, using the OMNeT++ simulator, we perform an evaluation of our approach. Secondly, we propose an adaptation of our solution in the context of Wireless Sensor Networks (WSNs) with energy constraint. We notably show that our protocol can be easily used for constructing clusters according to multiple criteria in the election of cluster-heads, such as nodes' identity, residual energy or degree. We give a comparison under the different election metrics by evaluating their communication cost and energy consumption. Simulation results show that in terms of number of exchanged messages and energy consumption, it is better to use the Highest-ID metric for electing CHs.

Product Recommendation System on VLDB using k-means Clustering and Sequential Pattern Technique (k-means 클러스터링과 순차 패턴 기법을 이용한 VLDB 기반의 상품 추천시스템)

  • Shim, Jang-Sup;Woo, Seon-Mi;Lee, Dong-Ha;Kim, Yong-Sung;Chung, Soon-Key
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.1027-1038
    • /
    • 2006
  • There are many technical problems in the recommendation system based on very large database(VLDB). So, it is necessary to study the recommendation system' structure and the data-mining technique suitable for the large scale Internet shopping mail. Thus we design and implement the product recommendation system using k-means clustering algorithm and sequential pattern technique which can be used in large scale Internet shopping mall. This paper processes user information by batch processing, defines the various categories by hierarchical structure, and uses a sequential pattern mining technique for the search engine. For predictive modeling and experiment, we use the real data(user's interest and preference of given category) extracted from log file of the major Internet shopping mall in Korea during 30 days. And we define PRP(Predictive Recommend Precision), PRR(Predictive Recommend Recall), and PF1(Predictive Factor One-measure) for evaluation. In the result of experiments, the best recommendation time and the best learning time of our system are much as O(N) and the values of measures are very excellent.

Squared Log-return and TGARCH Model : Asymmetric Volatility in Domestic Time Series (제곱수익률 그래프와 TGARCH 모형을 이용한 비대칭 변동성 분석)

  • Park, J.A.;Song, Y.J.;Baek, J.S.;Hwang, S.Y.;Choi, M.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.487-497
    • /
    • 2007
  • As is pointed out by Gourieroux (1997), the volatility effects in financial time series vary according to the signs of the return rates and therefore asymmetric Threshold-GARCH (TGARCH, henceforth) processes are natural extensions of the standard GARCH toward asymmetric volatility modeling. For preliminary detection of asymmetry in volatility, we suggest graphs of squared-log-returns for various financial time series including KOSPI, KOSDAQ and won-Euro exchange rate. Next, asymmetric TGARCH(1,1) model fits are provided in comparisons with standard GARCH(1.1) models.

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

Mobile App Analytics using Media Repertoire Approach (미디어 레퍼토리를 이용한 스마트폰 애플리케이션 이용 패턴 유형 분석)

  • Kwon, Sung Eun;Jang, Shu In;Hwangbo, Hyunwoo
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.133-154
    • /
    • 2021
  • Today smart phone is the most common media with a vehicle called 'application'. In order to understand how media users select applications and build their repertoire, this study conducted two-step approach using big data from smart phone log for 4 weeks in November 2019, and finally classified 8 media repertoire groups. Each of the eight media repertoire groups showed differences in time spent of mobile application category compared to other groups, and also showed differences between groups in demographic distribution. In addition to the academic contribution of identifying the mobile application repertoire with large scale behavioral data, this study also has significance in proposing a two-step approach that overcomes 'outlier issue' in behavioral data by extracting prototype vectors using SOM (Sefl-Organized Map) and applying it to k-means clustering for optimization of the classification. The study is also meaningful in that it categorizes customers using e-commerce services, identifies customer structure based on behavioral data, and provides practical guides to e-commerce communities that execute appropriate services or marketing decisions for each customer group.

A Personal Memex System Using Uniform Representation of the Data from Various Devices (다양한 기기로부터의 데이터 단일 표현을 통한 개인 미멕스 시스템)

  • Min, Young-Kun;Lee, Bog-Ju
    • The KIPS Transactions:PartB
    • /
    • v.16B no.4
    • /
    • pp.309-318
    • /
    • 2009
  • The researches on the system that automatically records and retrieves one's everyday life is relatively actively worked recently. These systems, called personal memex or life log, usually entail dedicated devices such as SenseCam in MyLifeBits project. This research paid attention to the digital devices such as mobile phones, credit cards, and digital camera that people use everyday. The system enables a person to store everyday life systematically that are saved in the devices or the deviced-related web pages (e.g., phone records in the cellular phone company) and to refer this quickly later. The data collection agent in the proposed system, called MyMemex, collects the personal life log "web data" using the web services that the web sites provide and stores the web data into the server. The "file data" stored in the off-line digital devices are also loaded into the server. Each of the file data or web data is viewed as a memex event that can be described by 4W1H form. The different types of data in different services are transformed into the memex event data in 4W1H form. The memex event ontology is used in this transform. Users can sign in to the web server of this service to view their life logs in the chronological manner. Users can also search the life logs using keywords. Moreover, the life logs can be viewed as a diary or story style by converting the memex events to sentences. The related memex events are grouped to be displayed as an "episode" by a heuristic identification method. A result with high accuracy has been obtained by the experiment for the episode identification using the real life log data of one of the authors.

Automatic Clustering Agent using PCA and SOM (PCA와 SOM을 이용한 자동 군집화 에이전트)

  • 박정은;김병진;오경환
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09b
    • /
    • pp.67-70
    • /
    • 2003
  • 인터넷의 정보 홍수 속에서 원하는 정보를 정확하게 제시간에 얻기란 쉬운 일이 아니며, 따라서 이러한 작업을 대신해주는 에이전트의 역할이 점점 커지고 있다. 대부분의 이벤트들이 실시간에 발생되고 처리되어야 하는 인터넷 환경에서는 분석가가 군집화의 방법과 결과 해석에 지속적으로 관여하기 어렵기 때문에 이러한 분석가의 업무를 대신하는 지능화된 에이전트가 필요하게 된다. 본 논문에서는 특히 자율학습 군집화에 대한 자동화된 시스템으로서 자동 군집화 에이전트를 제안하며 이 시스템은 군집화 수행 에이전트와 군집화 성능 평가 에이전트로 이루어져 있다. 두 개의 에이전트가 서로 정보를 교환하면서 자동적으로 최적의 군집화를 수행한다. 군집화 과정에서는 데이터를 분석하는 분석가가 군집화의 방법과 결과 해석에 실시간으로 관여하기 어렵기 때문에 이러한 작업을 담당하는 지능화된 에이전트가 자동화된 군집화를 담당하면 효과적인 군집화 전략이 될 수 있다. 또한 UCI Machine Repository의 IRIS 데이터와 Microsoft Web Log Data를 이용한 실험을 통해 제안 시스템의 성능 평가를 수행하였다.

  • PDF

Development of Qualification Analysis Preliminary Frame for Railway Personal Injury Accident (철도 사상사고 위험도 평가를 위한 정량화 분석 기초모델 개발)

  • Park, Chan-Woo;Wnag, Jong-Bae;Park, Joo-Nam;Kwak, Sang-Log
    • Proceedings of the KSR Conference
    • /
    • 2007.05a
    • /
    • pp.1227-1232
    • /
    • 2007
  • The objective of this study is to develop qualification analysis preliminary frame for railway personal injury accident. In this research, we develop accident scenarios to analyze systematically and evaluate quantitatively fatality accident scenarios for railway personal. The accident scenario analysis first identifies the hazardous events and explains the hazardous conditions that surround the accident and cause railway accidents. This method includes a feasibility test, a clustering process and a pattering process for a clearer understanding of the accident situation. Since this method enables an accident scenario analysis to be performed systematically as well as objectively, this method is useful in building better accident prevention strategies. Therefore, this study could serve to reduce railway accidents and could be an effective tool for a hazard analysis.

  • PDF

Web Log Analysis Technique using Fuzzy C-Means Clustering (Fuzzy C-Means클러스터링을 이용한 웹 로그 분석기법)

  • 김미라;곽미라;조동섭
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.550-552
    • /
    • 2002
  • 플러스터링이란 주어진 데이터 집합의 패턴들을 비슷한 성실을 가지는 그룹으로 나누어 패턴 상호간의 관계를 정립하기 위한 방법론으로, 지금가지 이를 위한 많은 알고리즘들이 개발되어 왔으며, 패턴인식, 영상 처리 등의 여러 공학 분야에 널리 적용되고 있다. FCM(Fuzzy C-Means) 알고리즘은 최소자승 기준함수(least square criterion function)에 퍼지이론을 적용만 목적함수의 반복최적화(iterative optimization)에 기반을 둔 방식으로, 하드 분할에 의한 기존의 클러스터링 방법이 승자(winner take all) 형태의 방법론을 취하는데 비하여, 각 패턴이 특정 클러스터에 속하는 소속정도를 줌으로써 보다 정확한 정보를 형성하도록 도와준다. 본 논문에서는 FCM 기법을 이용한 웹로그 분석을 하고자 한다.

  • PDF