• Title/Summary/Keyword: Association Rules Mining

Search Result 307, Processing Time 0.032 seconds

A Case Study on Characteristics of Gender and Major in Career Preparation of University Students from Low-income Families: Application of Text Frequency Analysis and Association Rules (저소득층 대학생들의 진로준비과정에서의 성별·전공별 특성에 대한 사례연구: 텍스트 빈도분석과 연관분석의 적용)

  • Lee, Jihye;Lee, Shinhye
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.61-69
    • /
    • 2018
  • This study aims to understand and to infer the implications from the career preparation experiences of low-income university students in the context of high youth unemployment rate and the polarization of the social classes. For this purpose, we selected 13 university students who received scholarship from the S scholarship foundation and conducted analysis using text mining techniques based on the six-time interviews. According to the results, university students seem to be influenced by home environment and income level when recalling previous academic experience or designing career during the interview process. Also, these differences were found to have different characteristics according to gender and major. This study is meaningful in that the qualitative research data is analyzed by applying the text mining technique in a convergent way. As a result, the college life and career preparation of low-income university students were explored through the frequency and relation of words.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

An Implementation of Mining Prototype System for Network Attack Analysis (네트워크 공격 분석을 위한 마이닝 프로토타입 시스템 구현)

  • Kim, Eun-Hee;Shin, Moon-Sun;Ryu, Keun-Ho
    • The KIPS Transactions:PartC
    • /
    • v.11C no.4
    • /
    • pp.455-462
    • /
    • 2004
  • Network attacks are various types with development of internet and are a new types. The existing intrusion detection systems need a lot of efforts and costs in order to detect and respond to unknown or modified attacks because of detection based on signatures of known attacks. In this paper, we present a design and implementation for mining prototype system to predict unknown or modified attacks through network protocol attributes analysis. In order to analyze attributes of network protocols, we use the association rule and the frequent episode. The collected network protocols are storing schema of TCP, UDP, ICMP and integrated type. We are generating rules that can predict the types of network attacks. Our mining prototype in the intrusion detection system aspect is useful for response against new attacks as extra tool.

Analysis of Dental Hygienist Job Recognition Using Text Mining

  • Kim, Bo-Ra;Ahn, Eunsuk;Hwang, Soo-Jeong;Jeong, Soon-Jeong;Kim, Sun-Mi;Han, Ji-Hyoung
    • Journal of dental hygiene science
    • /
    • v.21 no.1
    • /
    • pp.70-78
    • /
    • 2021
  • Background: The aim of this study was to analyze the public demand for information about the job of dental hygienists by mining text data collected from the online Q & A section on an Internet portal site. Methods: Text data were collected from inquiries that were posted on the Naver Q & A section from January 2003 to July 2020 using "dental hygienist job recognition," "role recognition," "medical assistance," and "scaling" as search keywords. Text mining techniques were used to identify significant Korean words and their frequency of occurrence. In addition, the association between words was analyzed. Results: A total of 10,753 Korean words related to the job of dental hygienists were extracted from the text data. "Chi-lyo (treatment)," "chigwa (dental clinic)," "ske-illing (scaling)," "itmom (gum)," and "chia (tooth)" were the five most frequently used words. The words were classified into the following areas of job of the dental hygienist: periodontal disease treatment and prevention, medical assistance, patient care and consultation, and others. Among these areas, the number of words related to medical assistance was the largest, with sixty-six association rules found between the words, and "chi-lyo," "chigwa," and "ske-illing" as core words. Conclusion: The public demand for information about the job of dental hygienists was mainly related to "chi-lyo," "chigwa," and "ske-illing" as core words, demonstrating that scaling is recognized by the public as the job of a dental hygienist. However, the high demand for information related to treatment and medical assistance in the context of dental hygienists indicates that the job of dental hygienists is recognized by the public as being more focused on medical assistance than preventive dental care that are provided with job autonomy.

Generally non-linear regression model containing standardized lift for association number estimation (연관성 규칙 수의 추정을 위한 일반적인 비선형 회귀모형에서의 표준화 향상도 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.629-638
    • /
    • 2016
  • Among data mining techniques, the association rule is one of the most used in the real fields because it clearly displays the relationship between two or more items in large databases by quantifying the relationship between the items. There are three primary quality measures for association rule; support, confidence, and lift. We evaluate association rules using these measures. The approach taken in the previous literatures as to estimation of association rule number has been one of a determination function method or a regression modeling approach. In this paper, we proposed a few of non-linear regression equations useful in estimating the number of rules and also evaluated the estimated association rules using the quality measures. Furthermore we assessed their usefulness as compared to conventional regression models using the values of regression coefficients, F statistics, adjusted coefficients of determination and variation inflation factor.

Analysis of Technology Association Rules Between CPC Codes of the 'Internet of Things(IoT)' Patent (CPC 코드 기반 사물인터넷(IoT) 특허의 기술 연관성 규칙 분석)

  • Shim, Jaeruen
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.5
    • /
    • pp.493-498
    • /
    • 2019
  • This study deals with the analysis of the technology association rules between CPC codes of the Internet of Things(IoT) patent, the core of the Fourth Industrial Revolution ICT-based technology. The association rules between CPC codes were extracted using R, an open source for data mining. To this end, we analyzed 369 of the 605 patents related to the Internet of Things filed with the Patent Office until July 2019, with a complex CPC code, up to the subclass-level. As a result of the technology association rules, CPC codes with high support were [H04W ${\rightarrow}$ H04L](18.2%), [H04L ${\rightarrow}$ H04W](18.2%), [G06Q ${\rightarrow}$ H04L](17.3%), [H04L ${\rightarrow}$ G06Q](17.3%), [H04W ${\rightarrow}$ G06Q](9.8%), [G06Q ${\rightarrow}$ H04W](9.8%), [G06F ${\rightarrow}$ H04L](7.9%), [H04L ${\rightarrow}$ G06F](7.9%), [G06F ${\rightarrow}$ G06Q](6.2%), [G06Q ${\rightarrow}$ G06F](6.2%). After analyzing the technology interconnection network, the core CPC codes related to technology association rules are G06Q and H04L. The results of this study can be used to predict future patent trends.

Effect of Market Basket Size on the Accuracy of Association Rule Measures (장바구니 크기가 연관규칙 척도의 정확성에 미치는 영향)

  • Kim, Nam-Gyu
    • Asia pacific journal of information systems
    • /
    • v.18 no.2
    • /
    • pp.95-114
    • /
    • 2008
  • Recent interests in data mining result from the expansion of the amount of business data and the growing business needs for extracting valuable knowledge from the data and then utilizing it for decision making process. In particular, recent advances in association rule mining techniques enable us to acquire knowledge concerning sales patterns among individual items from the voluminous transactional data. Certainly, one of the major purposes of association rule mining is to utilize acquired knowledge in providing marketing strategies such as cross-selling, sales promotion, and shelf-space allocation. In spite of the potential applicability of association rule mining, unfortunately, it is not often the case that the marketing mix acquired from data mining leads to the realized profit. The main difficulty of mining-based profit realization can be found in the fact that tremendous numbers of patterns are discovered by the association rule mining. Due to the many patterns, data mining experts should perform additional mining of the results of initial mining in order to extract only actionable and profitable knowledge, which exhausts much time and costs. In the literature, a number of interestingness measures have been devised for estimating discovered patterns. Most of the measures can be directly calculated from what is known as a contingency table, which summarizes the sales frequencies of exclusive items or itemsets. A contingency table can provide brief insights into the relationship between two or more itemsets of concern. However, it is important to note that some useful information concerning sales transactions may be lost when a contingency table is constructed. For instance, information regarding the size of each market basket(i.e., the number of items in each transaction) cannot be described in a contingency table. It is natural that a larger basket has a tendency to consist of more sales patterns. Therefore, if two itemsets are sold together in a very large basket, it can be expected that the basket contains two or more patterns and that the two itemsets belong to mutually different patterns. Therefore, we should classify frequent itemset into two categories, inter-pattern co-occurrence and intra-pattern co-occurrence, and investigate the effect of the market basket size on the two categories. This notion implies that any interestingness measures for association rules should consider not only the total frequency of target itemsets but also the size of each basket. There have been many attempts on analyzing various interestingness measures in the literature. Most of them have conducted qualitative comparison among various measures. The studies proposed desirable properties of interestingness measures and then surveyed how many properties are obeyed by each measure. However, relatively few attentions have been made on evaluating how well the patterns discovered by each measure are regarded to be valuable in the real world. In this paper, attempts are made to propose two notions regarding association rule measures. First, a quantitative criterion for estimating accuracy of association rule measures is presented. According to this criterion, a measure can be considered to be accurate if it assigns high scores to meaningful patterns that actually exist and low scores to arbitrary patterns that co-occur by coincidence. Next, complementary measures are presented to improve the accuracy of traditional association rule measures. By adopting the factor of market basket size, the devised measures attempt to discriminate the co-occurrence of itemsets in a small basket from another co-occurrence in a large basket. Intensive computer simulations under various workloads were performed in order to analyze the accuracy of various interestingness measures including traditional measures and the proposed measures.

Discovering Sequence Association Rules for Protein Structure Prediction (단백질 구조 예측을 위한 서열 연관 규칙 탐사)

  • Kim, Jeong-Ja;Lee, Do-Heon;Baek, Yun-Ju
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.553-560
    • /
    • 2001
  • Bioinformatics is a discipline to support biological experiment projects by storing, managing data arising from genome research. In can also lead the experimental design for genome function prediction and regulation. Among various approaches of the genome research, the proteomics have been drawing increasing attention since it deals with the final product of genomes, i.e., proteins, directly. This paper proposes a data mining technique to predict the structural characteristics of a given protein group, one of dominant factors of the functions of them. After explains associations among amino acid subsequences in the primary structures of proteins, which can provide important clues for determining secondary or tertiary structures of them, it defines a sequence association rule to represent the inter-subsequences. It also provides support and confidence measures, newly designed to evaluate the usefulness of sequence association rules, After is proposes a method to discover useful sequence association rules from a given protein group, it evaluates the performance of the proposed method with protein sequence data from the SWISS-PROT protein database.

  • PDF

Korea's Trade Rules Analysis using Topic Modeling : from 2000 to 2022 (토픽 모델링을 이용한 한국 무역규범 연구동향 분석 : 2000년~2022년)

  • Byeong-Ho Lim;Jeong-In Chang;Tae-Han Kim;Ha-Neul Han
    • Korea Trade Review
    • /
    • v.48 no.1
    • /
    • pp.55-81
    • /
    • 2023
  • The purpose of this study is to analyze the main issues and trends of Korean trade, and to draw implications for future research regarding trade rules. A total of 476 academic journal are analyzed using English keyword searched for 'Trade Rules' from 2000 to July 2022 in the Korean Journal Citation Index data base. The analysis methodology includes co-occurrence network and topic trend analysis which is a kind of text mining methods. The results shows that key words representing Korea's trade trend fall into four categories in which the number of research journals has rapidly increased, which are Topic 4 (Investment Treaty), Topic 7 (Trade Security), Topic 8 (China's Protectionism), and Topic 11 (Trade Settlement). The major background for these topics is the tension between the United States and China threatening the existing international trade system. A detailed study for China's protectionism, changes in trade security system, and new investment agreements, and changes in payment methods will be the challenges in near future.

Transaction Pattern Discrimination of Malicious Supply Chain using Tariff-Structured Big Data (관세 정형 빅데이터를 활용한 우범공급망 거래패턴 선별)

  • Kim, Seongchan;Song, Sa-Kwang;Cho, Minhee;Shin, Su-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.2
    • /
    • pp.121-129
    • /
    • 2021
  • In this study, we try to minimize the tariff risk by constructing a hazardous cargo screening model by applying Association Rule Mining, one of the data mining techniques. For this, the risk level between supply chains is calculated using the Apriori Algorithm, which is an association analysis algorithm, using the big data of the import declaration form of the Korea Customs Service(KCS). We perform data preprocessing and association rule mining to generate a model to be used in screening the supply chain. In the preprocessing process, we extract the attributes required for rule generation from the import declaration data after the error removing process. Then, we generate the rules by using the extracted attributes as inputs to the Apriori algorithm. The generated association rule model is loaded in the KCS screening system. When the import declaration which should be checked is received, the screening system refers to the model and returns the confidence value based on the supply chain information on the import declaration data. The result will be used to determine whether to check the import case. The 5-fold cross-validation of 16.6% precision and 33.8% recall showed that import declaration data for 2 years and 6 months were divided into learning data and test data. This is a result that is about 3.4 times higher in precision and 1.5 times higher in recall than frequency-based methods. This confirms that the proposed method is an effective way to reduce tariff risks.