• Title/Summary/Keyword: frequent item set

Search Result 21, Processing Time 0.025 seconds

A Study on the Implementation of an optimized Algorithm for association rule mining system using Fuzzy Utility (Fuzzy Utility를 활용한 연관규칙 마이닝 시스템을 위한 알고리즘의 구현에 관한 연구)

  • Park, In-Kyu;Choi, Gyoo-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.19-25
    • /
    • 2020
  • In frequent pattern mining, the uncertainty of each item is accompanied by a loss of information. AAlso, in real environment, the importance of patterns changes with time, so fuzzy logic must be applied to meet these requirements and the dynamic characteristics of the importance of patterns should be considered. In this paper, we propose a fuzzy utility mining technique for extracting frequent web page sets from web log databases through fuzzy utility-based web page set mining. Here, the downward closure characteristic of the fuzzy set is applied to remove a large space by the minimum fuzzy utility threshold (MFUT)and the user-defined percentile(UDP). Extensive performance analyses show that our algorithm is very efficient and scalable for Fuzzy Utility Mining using dynamic weights.

Portion sizes of foods frequently consumed by the Korean elderly: Data from KNHANES IV-2

  • Kim, Sook-Bae;Kim, Soon-Kyung;Kim, Se-Na;Kim, So-Young;Cho, Young-Sook;Kim, Mi-Hyun
    • Nutrition Research and Practice
    • /
    • v.5 no.6
    • /
    • pp.553-559
    • /
    • 2011
  • The purpose of this study was to define a one-portion size of food frequently consumed by the Koreans aged 65 years or over. From the original 8,631 people who took part in the Forth Korea National Health and Nutrition Examination Survey(KNHANES IV-2) 2008, we analyzed the data on 1,458 persons (16.9%) aged 65 and over, and selected food items consumed based on the intake frequency of 30 or more by all participant. A total of 158 varieties of food items were selected. The portion size of food items was set on the basis of the median amount (50 percentile) in a single intake by a single person. In the cereals category, 13 items were selected, of which the most frequently consumed item was well-polished rice with portion size of 75 g. Among legumes, 7 items were selected, of which the most frequent item was dried black soybean with a portion size of 6 g. Among the 16 groups, the most varied food group (49 items) was vegetables, and among these the most frequently occurring item was garlic (5 g), while among the fruit group, only 11 items were selected, as their intake frequency was low. Fish and shellfish were more frequently consumed by the elderly than meats. The most frequently consumed meat was pork loin, with a portion size of 30 g. In fish and shellfish, the most frequently consumed item was dried and boiled large anchovy with a portion size of 2 g. Portion sizes for food items consumed regularly by the elderly may be conveniently and effectively used in dietary planning and in nutritional education programs, and in assessing the diet intake status of the elderly.

Deterministic EOQ Model with Partial Backordering when Purchase Dependence Exists (구매종속성이 존재하는 상황에서 부분 부재고 EOQ 모형에 대한 고찰)

  • Park, Changkyu
    • Korean Management Science Review
    • /
    • v.32 no.1
    • /
    • pp.65-82
    • /
    • 2015
  • Purchase dependence is a frequent phenomenon in retail shops and is characterized by the purchase of certain items together due to their unknown interior associations. Although this concept has been significantly examined in the marketing field (e.g. market basket analysis), it has largely remained unaddressed in operations management. Since purchase dependence is an important factor in designing inventory replenishment policies, this paper demonstrates the means of applying it to the partial backordering inventory model. Through computational analyses, this paper compares the performance of inventory models that either consider or ignore purchase dependence; the results demonstrate that inventory models that ignore purchase dependence incur more average cost per unit time than the model that considers purchase dependence, and the impact of purchase dependence can increase in significance as the item set becomes more closely correlated with regard to order demand.

Partial Backordering Inventory Model under Purchase Dependence

  • Park, Changkyu
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.3
    • /
    • pp.275-288
    • /
    • 2015
  • Purchase dependence is a frequent phenomenon in retail shops and is characterized by the purchase of certain items together due to their unknown interior associations. Although this concept has been significantly examined in the marketing field (e.g. market basket analysis), it has largely remained unaddressed in operations management. Since purchase dependence is an important factor in designing inventory replenishment policies, this paper demonstrates the means of applying it to the partial backordering inventory model. Through computational analyses, this paper compares the performance of inventory models that either consider or ignore purchase dependence; the results demonstrate that inventory models that ignore purchase dependence incur more average cost per unit time than the model that considers purchase dependence, and the impact of purchase dependence can increase in significance as the item set becomes more closely correlated with regard to order demand.

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Anomalous Event Detection in Traffic Video Based on Sequential Temporal Patterns of Spatial Interval Events

  • Ashok Kumar, P.M.;Vaidehi, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.1
    • /
    • pp.169-189
    • /
    • 2015
  • Detection of anomalous events from video streams is a challenging problem in many video surveillance applications. One such application that has received significant attention from the computer vision community is traffic video surveillance. In this paper, a Lossy Count based Sequential Temporal Pattern mining approach (LC-STP) is proposed for detecting spatio-temporal abnormal events (such as a traffic violation at junction) from sequences of video streams. The proposed approach relies mainly on spatial abstractions of each object, mining frequent temporal patterns in a sequence of video frames to form a regular temporal pattern. In order to detect each object in every frame, the input video is first pre-processed by applying Gaussian Mixture Models. After the detection of foreground objects, the tracking is carried out using block motion estimation by the three-step search method. The primitive events of the object are represented by assigning spatial and temporal symbols corresponding to their location and time information. These primitive events are analyzed to form a temporal pattern in a sequence of video frames, representing temporal relation between various object's primitive events. This is repeated for each window of sequences, and the support for temporal sequence is obtained based on LC-STP to discover regular patterns of normal events. Events deviating from these patterns are identified as anomalies. Unlike the traditional frequent item set mining methods, the proposed method generates maximal frequent patterns without candidate generation. Furthermore, experimental results show that the proposed method performs well and can detect video anomalies in real traffic video data.

The Influence of Change Prevalence on Visual Short-Term Memory-Based Change Detection Performance (변화출현확률이 시각단기기억 기반 변화탐지 수행에 미치는 영향)

  • Son, Han-Gyeol;Hyun, Joo-Seok
    • Korean Journal of Cognitive Science
    • /
    • v.32 no.3
    • /
    • pp.117-139
    • /
    • 2021
  • The way of change detection in which presence of a different item is determined between memory and test arrays with a brief in-between time interval resembles how visual search is done considering that the different item is searched upon the onset of a test array being compared against the items in memory. According to the resemblance, the present study examined whether varying the probability of change occurrence in a visual short-term memory-based change detection task can influence the aspect of response-decision making (i.e., change prevalence effect). The simple-feature change detection task in the study consisted of a set of four colored boxes followed by another set of four colored boxes between which the participants determined presence or absence of a color change from one box to the other. The change prevalence was varied to 20, 50, or 80% in terms of change occurrences in total trials, and their change detection errors, detection sensitivity, and their subsequent RTs were analyzed. The analyses revealed that as the change prevalence increased, false alarms became more frequent while misses became less frequent, along with delayed correct-rejection responses. The observed change prevalence effect looks very similar to the target prevalence effect varying according to probability of target occurrence in visual search tasks, indicating that the background principles deriving these two effects may resemble each other.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (연관규칙 마이닝에서의 동시성 기준 확장에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu;Ahn, Jae-Hyeon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.23-38
    • /
    • 2012
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems. Our experiments use a real dataset acquired from one of the largest internet shopping malls in Korea. We use 66,278 transactions of 3,847 customers conducted during the last two years. Overall results show that the accuracy of association rules of frequent shoppers (whose average duration between orders is relatively short) is higher than that of causal shoppers. In addition we discover that with frequent shoppers, the accuracy of association rules appears very high when the co-occurrence criteria of the training set corresponds to the validation set (i.e., target set). It implies that the co-occurrence criteria of frequent shoppers should be set according to the application purpose period. For example, an analyzer should use a day as a co-occurrence criterion if he/she wants to offer a coupon valid only for a day to potential customers who will use the coupon. On the contrary, an analyzer should use a month as a co-occurrence criterion if he/she wants to publish a coupon book that can be used for a month. In the case of causal shoppers, the accuracy of association rules appears to not be affected by the period of the application purposes. The accuracy of the causal shoppers' association rules becomes higher when the longer co-occurrence criterion has been adopted. It implies that an analyzer has to set the co-occurrence criterion for as long as possible, regardless of the application purpose period.

Personalized Recommendation System using FP-tree Mining based on RFM (RFM기반 FP-tree 마이닝을 이용한 개인화 추천시스템)

  • Cho, Young-Sung;Ho, Ryu-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.197-206
    • /
    • 2012
  • A exisiting recommedation system using association rules has the problem, such as delay of processing speed from a cause of frequent scanning a large data, scalability and accuracy as well. In this paper, using a Implicit method which is not used user's profile for rating, we propose the personalized recommendation system which is a new method using the FP-tree mining based on RFM. It is necessary for us to keep the analysis of RFM method and FP-tree mining to be able to reflect attributes of customers and items based on the whole customers' data and purchased data in order to find the items with high purchasability. The proposed makes frequent items and creates association rule by using the FP-tree mining based on RFM without occurrence of candidate set. We can recommend the items with efficiency, are used to generate the recommendable item according to the basic threshold for association rules with support, confidence and lift. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.