• Title/Summary/Keyword: interestingness

Search Result 47, Processing Time 0.026 seconds

Exploration of relationship between confirmation measures and association thresholds (기준 확인 측도와 연관성 평가기준과의 관계 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.835-845
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

Bounds of PIM-based similarity measures with partially marginal proportion (부분적 주변 비율에 의한 확률적 흥미도 측도 기반 유사성 측도의 상한 및 하한의 설정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.857-864
    • /
    • 2015
  • By Wikipedia, data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Clustering or cluster analysis is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. The similarity measures being used in the clustering may be classified into various types depending on the characteristics of data. In this paper, we computed bounds for similarity measures based on the probabilistic interestingness measure with partially marginal probability such as Peirce I, Peirce II, Cole I, Cole II, Loevinger, Park I, and Park II measure. We confirmed the absolute value of Loevinger measure wasthe upper limit of the absolute value of any other existing measures. Ordering of other measures is determined by the size of concurrence proportion, non-simultaneous occurrence proportion, and mismatch proportion.

A study on the ordering of PIM family similarity measures without marginal probability (주변 확률을 고려하지 않는 확률적 흥미도 측도 계열 유사성 측도의 서열화)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.367-376
    • /
    • 2015
  • Today, big data has become a hot keyword in that big data may be defined as collection of data sets so huge and complex that it becomes difficult to process by traditional methods. Clustering method is to identify the information in a big database by assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. The similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we computed upper and lower limits for probability interestingness measure based similarity measures without marginal probability such as Yule I and II, Michael, Digby, Baulieu, and Dispersion measure. And we compared these measures by real data and simulated experiment. By Warrens (2008), Coefficients with the same quantities in the numerator and denominator, that are bounded, and are close to each other in the ordering, are likely to be more similar. Thus, results on bounds provide means of classifying various measures. Also, knowing which coefficients are similar provides insight into the stability of a given algorithm.

Development of association rule threshold by balancing of relative rule accuracy (상대적 규칙 정확도의 균형화에 의한 연관성 측도의 개발)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1345-1352
    • /
    • 2014
  • Data mining is the representative methodology to obtain meaningful information in the era of big data.By Wikipedia, association rule learning is a popular and well researched method for discovering interesting relationship between itemsets in large databases using association thresholds. It is intended to identify strong rules discovered in databases using different interestingness measures. Unlike general association rule, inverse association rule mining finds the rules that a special item does not occur if an item does not occur. If two types of association rule can be simultaneously considered, we can obtain the marketing information for some related products as well as the information of specific product marketing. In this paper, we propose a balanced attributable relative accuracy applicable to these association rule techniques, and then check the three conditions of interestingness measures by Piatetsky-Shapiro (1991). The comparative studies with rule accuracy, relative accuracy, attributable relative accuracy, and balanced attributable relative accuracy are shown by numerical example. The results show that balanced attributable relative accuracy is better than any other accuracy measures.

Common-Sense Knowledge based Post-Processing Technique in Data Mining (데이터 마이닝에서의 상식 기반 후처리 기법)

  • Lee, In-Gi;Yong, Hwan-Seung
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.25-28
    • /
    • 2011
  • 새로운 지식과 패턴을 발견하고자 하는 데이터 마이닝 알고리즘들은 큰 수의 규칙들을 생성하는 문제점을 가지고 있다. 최근 들어 이러한 문제를 해결하기 위한 방법으로 다양한 유용성(Interestingness) 연구들이 데이터 마이닝의 후처리 단계에서 진행되고 있다. 그러나 이러한 접근방법들 역시 지식을 습득하기 위한 과정에서 병목현상을 보여줌으로써 수많은 상식수준의 규칙을 정제하지 못하고 있다. 본 연구에서는 이러한 문제점을 해결하기 위한 방안으로 상식을 기반으로 하는 Common-Sense 척도를 정의하고 구현한다. 규칙이 얼마나 상식에 가까운지를 시맨틱 차원교체 기법을 이용한 유사도 분석을 통해 측정한다.

The Effect of Animation on Comprehension and Interest (애니매이션이 이해와 흥미에 미치는 효과)

  • Kim, Sung-il;Whang, Sang-min;Barbara Tversky;Julie Morrison
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2002.05a
    • /
    • pp.85-91
    • /
    • 2002
  • This study was conducted to investigate the interaction effects of various presentation types of graphics and the individual differences in need for cognition on comprehension, interestingness, and motivation. The depiction of the operation of a bicycle tire pump was presented in one of the following conditions, (a) simultaneous presentation, (b) successive presentation, (c) self-pace presentation, (d) animation. For younger students, animated graphics are rated more enjoyable and motivating only when they are low in NFC. If they are high in NFC, animated graphics are not more effective than static graphics in terms of comprehension, interest, and motivation. On the other hand, for older students, self-paced static graphics are more interesting and enjoyable than the animated graphics regardless of their NFC score. These results suggest that the animated graphics are not always beneficial for loaming and motivation.

  • PDF

An Analysis on Internet Information using Real Time Search Words (실시간 검색어 분석을 이용한 인터넷 정보 관심도 분석)

  • Noh, Giseop
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.337-341
    • /
    • 2018
  • As the online media continues to evolve and the mobile computing environment has improved dramatically, the distribution of Internet information has rapidly changed from one-sided to consumer-oriented. Therefore, measuring the interest of Internet information has become an important issue for suppliers and consumers. In this paper, we analyze the Internet information interest by analyzing the duration of real - time query by collecting data for one month by implementing real - time search word provided by domestic Internet information provider.

The Effect of Online News Use Motivation on Acceptance and Satisfaction A Comparative Study on Korean and Chinese University Students (온라인 뉴스 이용 동기가 수용의도와 만족도에 미치는 영향 - 한·중 대학생을 비교 중심으로 -)

  • Wang, Shang;An, Su-keon
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.6
    • /
    • pp.293-311
    • /
    • 2020
  • Recently, it is more important to study the reasons for using media than which media is selected. This paper investigates different countries as objects to conduct the survey. In accordance with the research results, in hypothesis 1, there is a positive "(+)" influence of its interestingness, informality, restlessness, news pursuit and convenience on satisfaction when college students in South Korea use net news. Taking Chinese college students as an example, there is a positive "(+)" influence of the using motivation of net news on news pursuit, habituation, interactivity, convenience and the satisfaction with net news. In hypothesis 2, the interestingness, informality, habituation and convenience of the using motivation of online news of college students in South Korea are reflected in the acceptance intention of online news, while for Chinese college students, the informality, habituation and convenience are reflected in the acceptance intention of online news. Finally, in hypothesis 3, there is a positive "(+)" influence of the satisfaction of online news on the acceptance level of online news. In addition, this research also considers that the PLS path coefficient of college students in South Korea and China is different, and the motivations and purposes for using net news by two countries are different due to the characteristics and cultures of news media in different countries, so the satisfaction is also different.

The proposition of compared and attributably pure confidence in association rule mining (연관 규칙 마이닝에서 비교 기여 순수 신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.523-532
    • /
    • 2013
  • Generally, data mining is the process of analyzing big data from different perspectives and summarizing it into useful information. The most widely used data mining technique is to generate association rules, and it finds the relevance between two items in a huge database. This technique has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, etc. Among many interestingness measures, confidence is the most frequently used, but it has the drawback that it can not determine the direction of the association. The attributably pure confidence and compared confidence are able to determine the direction of the association, but their ranges are not [-1, +1]. So we can not interpret the degree of association operationally by their values. This paper propose a compared and attributably pure confidence to compensate for this drawback, and then describe some properties for a proposed measure. The comparative studies with confidence, compared confidence, attributably pure confidence, and a proposed measure are shown by numerical example. The results show that the a compared and attributably pure confidence is better than any other confidences.