• Title/Summary/Keyword: Access Patterns

Search Result 418, Processing Time 0.023 seconds

Analysis and Evaluation of Frequent Pattern Mining Technique based on Landmark Window (랜드마크 윈도우 기반의 빈발 패턴 마이닝 기법의 분석 및 성능평가)

  • Pyun, Gwangbum;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.101-107
    • /
    • 2014
  • With the development of online service, recent forms of databases have been changed from static database structures to dynamic stream database structures. Previous data mining techniques have been used as tools of decision making such as establishment of marketing strategies and DNA analyses. However, the capability to analyze real-time data more quickly is necessary in the recent interesting areas such as sensor network, robotics, and artificial intelligence. Landmark window-based frequent pattern mining, one of the stream mining approaches, performs mining operations with respect to parts of databases or each transaction of them, instead of all the data. In this paper, we analyze and evaluate the techniques of the well-known landmark window-based frequent pattern mining algorithms, called Lossy counting and hMiner. When Lossy counting mines frequent patterns from a set of new transactions, it performs union operations between the previous and current mining results. hMiner, which is a state-of-the-art algorithm based on the landmark window model, conducts mining operations whenever a new transaction occurs. Since hMiner extracts frequent patterns as soon as a new transaction is entered, we can obtain the latest mining results reflecting real-time information. For this reason, such algorithms are also called online mining approaches. We evaluate and compare the performance of the primitive algorithm, Lossy counting and the latest one, hMiner. As the criteria of our performance analysis, we first consider algorithms' total runtime and average processing time per transaction. In addition, to compare the efficiency of storage structures between them, their maximum memory usage is also evaluated. Lastly, we show how stably the two algorithms conduct their mining works with respect to the databases that feature gradually increasing items. With respect to the evaluation results of mining time and transaction processing, hMiner has higher speed than that of Lossy counting. Since hMiner stores candidate frequent patterns in a hash method, it can directly access candidate frequent patterns. Meanwhile, Lossy counting stores them in a lattice manner; thus, it has to search for multiple nodes in order to access the candidate frequent patterns. On the other hand, hMiner shows worse performance than that of Lossy counting in terms of maximum memory usage. hMiner should have all of the information for candidate frequent patterns to store them to hash's buckets, while Lossy counting stores them, reducing their information by using the lattice method. Since the storage of Lossy counting can share items concurrently included in multiple patterns, its memory usage is more efficient than that of hMiner. However, hMiner presents better efficiency than that of Lossy counting with respect to scalability evaluation due to the following reasons. If the number of items is increased, shared items are decreased in contrast; thereby, Lossy counting's memory efficiency is weakened. Furthermore, if the number of transactions becomes higher, its pruning effect becomes worse. From the experimental results, we can determine that the landmark window-based frequent pattern mining algorithms are suitable for real-time systems although they require a significant amount of memory. Hence, we need to improve their data structures more efficiently in order to utilize them additionally in resource-constrained environments such as WSN(Wireless sensor network).

Finding Weighted Sequential Patterns over Data Streams via a Gap-based Weighting Approach (발생 간격 기반 가중치 부여 기법을 활용한 데이터 스트림에서 가중치 순차패턴 탐색)

  • Chang, Joong-Hyuk
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.55-75
    • /
    • 2010
  • Sequential pattern mining aims to discover interesting sequential patterns in a sequence database, and it is one of the essential data mining tasks widely used in various application fields such as Web access pattern analysis, customer purchase pattern analysis, and DNA sequence analysis. In general sequential pattern mining, only the generation order of data element in a sequence is considered, so that it can easily find simple sequential patterns, but has a limit to find more interesting sequential patterns being widely used in real world applications. One of the essential research topics to compensate the limit is a topic of weighted sequential pattern mining. In weighted sequential pattern mining, not only the generation order of data element but also its weight is considered to get more interesting sequential patterns. In recent, data has been increasingly taking the form of continuous data streams rather than finite stored data sets in various application fields, the database research community has begun focusing its attention on processing over data streams. The data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. In data stream processing, each data element should be examined at most once to analyze the data stream, and the memory usage for data stream analysis should be restricted finitely although new data elements are continuously generated in a data stream. Moreover, newly generated data elements should be processed as fast as possible to produce the up-to-date analysis result of a data stream, so that it can be instantly utilized upon request. To satisfy these requirements, data stream processing sacrifices the correctness of its analysis result by allowing some error. Considering the changes in the form of data generated in real world application fields, many researches have been actively performed to find various kinds of knowledge embedded in data streams. They mainly focus on efficient mining of frequent itemsets and sequential patterns over data streams, which have been proven to be useful in conventional data mining for a finite data set. In addition, mining algorithms have also been proposed to efficiently reflect the changes of data streams over time into their mining results. However, they have been targeting on finding naively interesting patterns such as frequent patterns and simple sequential patterns, which are found intuitively, taking no interest in mining novel interesting patterns that express the characteristics of target data streams better. Therefore, it can be a valuable research topic in the field of mining data streams to define novel interesting patterns and develop a mining method finding the novel patterns, which will be effectively used to analyze recent data streams. This paper proposes a gap-based weighting approach for a sequential pattern and amining method of weighted sequential patterns over sequence data streams via the weighting approach. A gap-based weight of a sequential pattern can be computed from the gaps of data elements in the sequential pattern without any pre-defined weight information. That is, in the approach, the gaps of data elements in each sequential pattern as well as their generation orders are used to get the weight of the sequential pattern, therefore it can help to get more interesting and useful sequential patterns. Recently most of computer application fields generate data as a form of data streams rather than a finite data set. Considering the change of data, the proposed method is mainly focus on sequence data streams.

A study on Locational and Regional Pattern of Leisure Facilities at Kangnam-gu, Seoul (서울시의 활동여가시설의 입지유형에 관한 연구 - 강남구를 중심으로 -)

  • Choi, Woun-Sik;Kim, Min
    • Journal of the Korean Regional Science Association
    • /
    • v.10 no.2
    • /
    • pp.17-29
    • /
    • 1994
  • This study attempts to examine the regional distribution and the locational pattern of leisure facilities at Kangnam-gu in Seoul. For the convenience of the analysis the facilities are classified into public and private sector and then the facilities are classified into 11 types: mineral spring resort, play ground, neighborhood park, swimming pool, gymnasium, bowling, pingpong, aerobic, golf practice, health, and billiard facilities. For the purpose data was collected from statistical yearbook in 1993 and lists of registered facility at department of living physics of Kangnam-gu office. The data of the density of facilities and the opportunity of facilities per facilities type and per region are analysed with the technology of GIS. Results may be summarized as follows. First of all, correlation between the results of Location-Allocation model and the results of Interaction model is very high. Secondly, on comparing the density of facilities with the opportunity of the facility use per eleven facility types, three discrete spatial pattems are found. The mineral spring resort facility type with the highest unbalanced density and opportunity of facility use is to be found. Play ground, neighborhood park, swimming pool, gymnasium, bowling, pingpong, and aerobic facility types have the high unbalanced density and opportunity of facility use. The golf practice, health, and billiard facility types have spatially balanced density and opportunity of facility use. Thirdly, as comparing the density and the opportunity of the facility use per 'dong' administration unit, the spatial patterns of the public and the private facilities are different in density of the facility use and otherwise two are similar in the opportunity of the facility use. Fourthly, patterns of facilities users have different charateristics based on facility use time, expense, residence, and access time and four regional patterns are to be found ; user favorable, facility profitable, user balanced, and unfavorable.

  • PDF

Geographic Disparities in Prostate Cancer Outcomes - Review of International Patterns

  • Baade, Peter D.;Yu, Xue Qin;Smith, David P.;Dunn, Jeff;Chambers, Suzanne K.
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.3
    • /
    • pp.1259-1275
    • /
    • 2015
  • Background: This study reviewed the published evidence as to how prostate cancer outcomes vary across geographical remoteness and area level disadvantage. Materials and Methods: A review of the literature published from January 1998 to January 2014 was undertaken: Medline and CINAHL databases were searched in February to May 2014. The search terms included terms of 'Prostate cancer' and 'prostatic neoplasms' coupled with 'rural health', 'urban health', 'geographic inequalities', 'spatial', 'socioeconomic', 'disadvantage', 'health literacy' or 'health service accessibility'. Outcome specific terms were 'incidence', 'mortality', 'prevalence', 'survival', 'disease progression', 'PSA testing' or 'PSA screening', 'treatment', 'treatment complications' and 'recurrence'. A further search through internet search engines was conducted to identify any additional relevant published reports. Results: 91 papers were included in the review. While patterns were sometimes contrasting, the predominate patterns were for PSA testing to be more common in urban (5 studies out of 6) and affluent areas (2 of 2), higher prostate cancer incidence in urban (12 of 22) and affluent (18 of 20), greater risk of advanced stage prostate cancer in rural (7 of 11) and disadvantaged (8 of 9), higher survival in urban (8 of 13) and affluent (16 of 18), greater access or use of definitive treatment services in urban (6 of 9) and affluent (7 of 7), and higher prostate mortality in rural (10 of 20) and disadvantaged (8 of 16) areas. Conclusions: Future studies may need to utilise a mixed methods approach, in which the quantifiable attributes of the individuals living within areas are measured along with the characteristics of the areas themselves, but importantly include a qualitative examination of the lived experience of people within those areas. These studies should be conducted across a range of international countries using consistent measures and incorporate dialogue between clinicians, epidemiologists, policy advocates and disease control specialists.

Memory Controller Architecture with Adaptive Interconnection Delay Estimation for High Speed Memory (고속 메모리의 전송선 지연시간을 적응적으로 반영하는 메모리 제어기 구조)

  • Lee, Chanho;Koo, Kyochul
    • Journal of IKEEE
    • /
    • v.17 no.2
    • /
    • pp.168-175
    • /
    • 2013
  • The delay times due to the propagating of data on PCB depend on the shape and length of interconnection lines when memory controllers and high speed memories are soldered on the PCB. The dependency on the placement and routing on the PCB requires redesign of I/O logic or reconfiguration of the memory controller after the delay time is measured if the controller is programmable. In this paper, we propose architecture of configuring logic for the delay time estimation by writing and reading test patterns while initializing the memories. The configuration logic writes test patterns to the memory and reads them by changing timing until the correct patterns are read. The timing information is stored and the configuration logic configures the memory controller at the end of initialization. The proposed method enables easy design of systems using PCB by solving the problem of the mismatching caused by the variation of placement and routing of components including memories and memory controllers. The proposed method can be applied to high speed SRAM, DRAM, and flash memory.

An Optimization Technique for Irregular Data Access Patterns on Software Controlled On-Chip Memory SubSystems (소프트웨어 제어 온칩 메모리 서브시스템에서 불규칙 데이터 접근 패턴 최적화 기법)

  • Cho, Doo-San;Cho, Jung-Seok
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.212-214
    • /
    • 2012
  • 데이터 집약적인 대부분의 애플리케이션들은 규칙적인 메모리 접근 패턴과 동시에 불규칙적인 접근 패턴을 커널 코드에 포함하고 있다. 그 동안 대부분의 메모리 접근 패턴 최적화 기법은 규칙적인 패턴에 집중되어 있었다. 하지만 암호화/통신 관련 애플리케이션에서는 불규칙한 패턴으로 메모리 접근의 대부분을 구성하는 경우가 많다. 이러한 불규칙한 메모리 접근 패턴을 대상으로 온칩메모리를 효율적으로 사용하도록 최적화 기법을 일반화하여 설계하는 일은 어려운 작업이기 때문에 관련 연구분야에 큰 진전이 없는 실정이다. 우리는 불규칙 메모리 접근 패턴 최적화 문제를 해결하기 위하여 데이터 클러스터링 기법을 제안하였다. 클러스터링은 접근되는 데이터의 시공간 지역성을 계산하여 이득이 큰 데이터들을 하나의 블록으로 구성하여 온칩메모리에 상주시키는 기본단위로 사용하는 기법이다. 본 기법을 이용하면 기존의 캐시메모리에 비하여 약 19% 에너지 소모를 절감할 수 있다.

History Document Image Background Noise and Removal Methods

  • Ganchimeg, Ganbold
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.11-24
    • /
    • 2015
  • It is common for archive libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing in order to remove background noise and become more legible. Document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. In this paper, we propose a hybrid binarization approach for improving the quality of old documents using a combination of global and local thresholding. This article also reviews noises that might appear in scanned document images and discusses some noise removal methods.

Development of a Web Based Pattern Identification System with Questionnaire Optimization (설문 최적화를 통한 개방형 웹 변증 시스템 개발)

  • Lee, Jae Chul;Jin, Hee Jeong
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.27 no.6
    • /
    • pp.827-831
    • /
    • 2013
  • This study aims to develop a pattern identification system (PIS) for general users to check up their body condition. We collected previous 3 PI questionnaires, with internal consistency reliability or validity of the diagnosis by experts, through a field test. For defining weights of pattern indices, we applied the analytic hierarchy process (AHP) method with 11 experts. PIS receive two kinds of symptoms of users : body region based symptoms and core symptoms for PIS. PIS suggest possible patterns and health information on the basis of selected symptoms with analysis by AHP. This study showed PIS could be easily used for general user who wants to access Korean Medicine compared to conventional PI system. Furthermore, it could be utilized with mobile environment or as remote medicine care.

Design of the web data mining system and definition of useful access patterns (웹 마이닝 시스템 설계 및 유용한 접근 패턴 정의)

  • 김종달;김성민;남도원;이동하;이전영
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.283-291
    • /
    • 2000
  • 인터넷 서비스 제공자들이 관심을 가지고 있는 것 중 하나는 인터넷 사용자들의 서비스 이용 패턴과 경향을 분석하는 것이다. 이를 통해 매출 증대와 실제 경영에 도움이 되는 사용자의 특성을 이해할 수 있기 때문이다. 이와 관련된 기본적인 접근방법은 사용자가 웹 서버에 접근했을 때 서버에 남는 웹 로그를 분석하여 사용자 패턴을 분석하는 것이다. 웹 로그 분석에 전형저인 통계기법이 사용되고 있다. 그러나 단순 통계 기법만으로는 알려지지 않는 데이터들 사이에 숨겨진 유용한 정보를 찾는 데에는 한계가 있다. 최근에는 이러한 한계를 극복하기 위해 데이터 마이닝 기술을 이용한 새로운 접근 방법이 시도되고 있다. 그러나 실제로 웹 로그에서부터 데이터 마이닝 기술을 이용하는 데에는 전처리 과정의 어려움과 실제 유용한 패턴을 어떻게 정의하는 가가 어려운 문제이다. 본 연구에서는 로(raw) 데이터인 웹 로그에서 유용한 패턴을 찾기 위한 전처리 과정을 알아보고, 웹 마이닝 시스템에 적합한 트랜잭션의 데이터 구조를 제시한다. 그리고 정의된 데이터 구조를 통한 패턴 발견 과정인 웹 사이트의 개념계층을 이용한 통계 기법과 연관규칙(Association Rules) 탐사에 대해 알아본다. 마지막으로 정의된 데이터 구조를 통한 새로운 유용한 패턴을 정의한ㄷ.

  • PDF

HFPD Analysis Using Fractal and Statistical Methods (프랙탈 및 통계적 방법을 이용한 HFPD 분석)

  • Jung, Young-Ill;Lim, Yong-Bae;Kim, Duck-Keun
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2002.07b
    • /
    • pp.927-930
    • /
    • 2002
  • The HFPD measurement method is a technique to analyze aging state of high voltage insulation materials and detect higher frequency signals than conventional PD measurement method therefore it takes less noise effect and could execute active line measurement. It is possible to analyze main discharge phenomena and obtain access to aging progress occurred in insulation materials through accumulation of HFPD signals during determined interval and expression of fractal dimension using statistical process of accumulated signals. In this study, the statistical parameters (skewness & kurtosis) and fractal dimensions are changed by discharge patterns that is shown up different characteristics with applied voltages and times.

  • PDF