• Title/Summary/Keyword: 연관규칙마이닝

Search Result 287, Processing Time 0.024 seconds

Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining (우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석)

  • Jun-O, Lee;So Young, Sohn
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

Sentiment Analysis and Issue Mining on All-Solid-State Battery Using Social Media Data (소셜미디어 분석을 통한 전고체 배터리 감성분석과 이슈 탐색)

  • Lee, Ji Yeon;Lee, Byeong-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.11-21
    • /
    • 2022
  • All-solid-state batteries are one of the promising candidates for next-generation batteries and are drawing attention as a key component that will lead the future electric vehicle industry. This study analyzes 10,280 comments on Reddit, which is a global social media, in order to identify policy issues and public interest related to all-solid-state batteries from 2016 to 2021. Text mining such as frequency analysis, association rule analysis, and topic modeling, and sentiment analysis are applied to the collected global data to grasp global trends, compare them with the South Korean government's all-solid-state battery development strategy, and suggest policy directions for its national research and development. As a result, the overall sentiment toward all-solid-state battery issues was positive with 50.5% positive and 39.5% negative comments. In addition, as a result of analyzing detailed emotions, it was found that the public had trust and expectation for all-solid-state batteries. However, feelings of concern about unresolved problems coexisted. This study has an academic and practical contribution in that it presented a text mining analysis method for deriving key issues related to all-solid-state batteries, and a more comprehensive trend analysis by employing both a top-down approach based on government policy analysis and a bottom-up approach that analyzes public perception.

Usefulness of Data Mining in Criminal Investigation (데이터 마이닝의 범죄수사 적용 가능성)

  • Kim, Joon-Woo;Sohn, Joong-Kweon;Lee, Sang-Han
    • Journal of forensic and investigative science
    • /
    • v.1 no.2
    • /
    • pp.5-19
    • /
    • 2006
  • Data mining is an information extraction activity to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle relationships in data and infers rules that allow the prediction of future results. Typical applications include market segmentation, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis. Law enforcement agencies deal with mass data to investigate the crime and its amount is increasing due to the development of processing the data by using computer. Now new challenge to discover knowledge in that data is confronted to us. It can be applied in criminal investigation to find offenders by analysis of complex and relational data structures and free texts using their criminal records or statement texts. This study was aimed to evaluate possibile application of data mining and its limitation in practical criminal investigation. Clustering of the criminal cases will be possible in habitual crimes such as fraud and burglary when using data mining to identify the crime pattern. Neural network modelling, one of tools in data mining, can be applied to differentiating suspect's photograph or handwriting with that of convict or criminal profiling. A case study of in practical insurance fraud showed that data mining was useful in organized crimes such as gang, terrorism and money laundering. But the products of data mining in criminal investigation should be cautious for evaluating because data mining just offer a clue instead of conclusion. The legal regulation is needed to control the abuse of law enforcement agencies and to protect personal privacy or human rights.

  • PDF

Emotion Prediction of Paragraph using Big Data Analysis (빅데이터 분석을 이용한 문단 내의 감정 예측)

  • Kim, Jin-su
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.267-273
    • /
    • 2016
  • Creation and Sharing of information which is structured data as well as various unstructured data. makes progress actively through the spread of mobile. Recently, Big Data extracts the semantic information from SNS and data mining is one of the big data technique. Especially, the general emotion analysis that expresses the collective intelligence of the masses is utilized using large and a variety of materials. In this paper, we propose the emotion prediction system architecture which extracts the significant keywords from social network paragraphs using n-gram and Korean morphological analyzer, and predicts the emotion using SVM and these extracted emotion features. The proposed system showed 82.25% more improved recall rate in average than previous systems and it will help extract the semantic keyword using morphological analysis.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (온라인 연관관계 분석의 장바구니 기준에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu
    • CRM연구
    • /
    • v.4 no.2
    • /
    • pp.19-29
    • /
    • 2011
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems.

  • PDF

Design and Analysis of Efficient Operation Sequencing in FMC Robot Using Simulation and Sequential Patterns (시뮬레이션과 순차 패턴을 이용한 FMC 로봇의 효율적 작업 순서 설계 및 분석)

  • Kim, Sun-Gil;Kim, Youn-Jin;Lee, Hong-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.6
    • /
    • pp.2021-2029
    • /
    • 2010
  • This paper suggested the method to design and analyze FMC robot's dispatching rule using the Simulation and Sequential Patterns. To do this, first of all, we built FMC using simulation and then, extracted signals that facilities call a robot, saved it as the log type. Secondly, we built robot's optimal path using the Sequential Pattern Mining with the results of analyzing the log and relationship between machine and robot actions. Lastly, we adapted it to the A corp.'s manufacturing line for verifying its performance. As a result of applying the new dispatching rule in FMC, total throughput and total flow time decrease because of decreasing material loss time and increasing robot utility. Furthermore, because this method can be applied for every manufacturing plant using simulation, it can contribute to advance total FMC efficiency as well.

An Investigation on Expanding Co-occurrence Criteria in Association Rule Mining (연관규칙 마이닝에서의 동시성 기준 확장에 대한 연구)

  • Kim, Mi-Sung;Kim, Nam-Gyu;Ahn, Jae-Hyeon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.23-38
    • /
    • 2012
  • There is a large difference between purchasing patterns in an online shopping mall and in an offline market. This difference may be caused mainly by the difference in accessibility of online and offline markets. It means that an interval between the initial purchasing decision and its realization appears to be relatively short in an online shopping mall, because a customer can make an order immediately. Because of the short interval between a purchasing decision and its realization, an online shopping mall transaction usually contains fewer items than that of an offline market. In an offline market, customers usually keep some items in mind and buy them all at once a few days after deciding to buy them, instead of buying each item individually and immediately. On the contrary, more than 70% of online shopping mall transactions contain only one item. This statistic implies that traditional data mining techniques cannot be directly applied to online market analysis, because hardly any association rules can survive with an acceptable level of Support because of too many Null Transactions. Most market basket analyses on online shopping mall transactions, therefore, have been performed by expanding the co-occurrence criteria of traditional association rule mining. While the traditional co-occurrence criteria defines items purchased in one transaction as concurrently purchased items, the expanded co-occurrence criteria regards items purchased by a customer during some predefined period (e.g., a day) as concurrently purchased items. In studies using expanded co-occurrence criteria, however, the criteria has been defined arbitrarily by researchers without any theoretical grounds or agreement. The lack of clear grounds of adopting a certain co-occurrence criteria degrades the reliability of the analytical results. Moreover, it is hard to derive new meaningful findings by combining the outcomes of previous individual studies. In this paper, we attempt to compare expanded co-occurrence criteria and propose a guideline for selecting an appropriate one. First of all, we compare the accuracy of association rules discovered according to various co-occurrence criteria. By doing this experiment we expect that we can provide a guideline for selecting appropriate co-occurrence criteria that corresponds to the purpose of the analysis. Additionally, we will perform similar experiments with several groups of customers that are segmented by each customer's average duration between orders. By this experiment, we attempt to discover the relationship between the optimal co-occurrence criteria and the customer's average duration between orders. Finally, by a series of experiments, we expect that we can provide basic guidelines for developing customized recommendation systems. Our experiments use a real dataset acquired from one of the largest internet shopping malls in Korea. We use 66,278 transactions of 3,847 customers conducted during the last two years. Overall results show that the accuracy of association rules of frequent shoppers (whose average duration between orders is relatively short) is higher than that of causal shoppers. In addition we discover that with frequent shoppers, the accuracy of association rules appears very high when the co-occurrence criteria of the training set corresponds to the validation set (i.e., target set). It implies that the co-occurrence criteria of frequent shoppers should be set according to the application purpose period. For example, an analyzer should use a day as a co-occurrence criterion if he/she wants to offer a coupon valid only for a day to potential customers who will use the coupon. On the contrary, an analyzer should use a month as a co-occurrence criterion if he/she wants to publish a coupon book that can be used for a month. In the case of causal shoppers, the accuracy of association rules appears to not be affected by the period of the application purposes. The accuracy of the causal shoppers' association rules becomes higher when the longer co-occurrence criterion has been adopted. It implies that an analyzer has to set the co-occurrence criterion for as long as possible, regardless of the application purpose period.

Study for Analyzing Defense Industry Technology using Datamining technique: Patent Analysis Approach (데이터마이닝을 통한 방위산업기술 분석 연구: 특허분석을 중심으로)

  • Son, Changho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.101-107
    • /
    • 2018
  • Recently, Korea's defense industry has advanced highly, and defense R&D budget is gradually increasing in defense budget. However, without objective analysis of defense industry technology, effective defense R&D activities are limited and defense budgets can be used inefficiently. Therefore, in addition to analyzing the defense industry technology quantitatively reflecting the opinions of the experts, this paper aims to analyze the defense industry technology objectively by quantitative methods, and to make efficient use of the defense budget. In addition, we propose a patent analysis method to grasp the characteristics of the defense industry technology and the vacant technology objectively and systematically by applying the big data analysis method, which is one of the keywords of the 4th industrial revolution, to the defense industry technology. The proposed method is applied to the technology of the firepower industry among several defense industrial technologies and the case analysis is conducted. In the process, the patents of 10 domestic companies related to firepower were collected through the Kipris in the defense industry companies' classification of the Korea Defense Industry Association(KDIA), and the data matrix was preprocessed to utilize IPC codes among them. And then, we Implemented association rule mining which can grasp the relation between each item in data mining technique using R program. The results of this study are suggested through interpretation of support, confidence lift index which were resulted from suggested approach. Therefore, this paper suggests that it can help the efficient use of massive national defense budget and enhance the competitiveness of defense industry technology.

Association Analysis of Product Sales using Sequential Layer Filtering (순차적 레이어 필터링을 이용한 상품 판매 연관도 분석)

  • Sun-Ho Bang;Kang-Hyun Lee;Ji-Young Jang;Tsatsral Telmentugs;Kwnag-Sup Shin
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.213-224
    • /
    • 2022
  • In logistics and distribution, Market Basket Analysis (MBA) is used as an important means to analyze the correlation between major sales products and to increase internal operational efficiency. In particular, the results of market basket analysis are used as important reference data for decision-making processes such as product purchase prediction, product recommendation, and product display structure in stores. With the recent development of e-commerce, the number of items handled by a single distribution and logistics company has rapidly increased, And the existing analytical methods such as Apriori and FP-Growth have slowed down due to the exponential increase in the amount of calculation and applied to actual business. There is a limit to examining important association rules to overcome this limitation, In this study, at the Main-Category level, which is the highest classification system of products, the utility item set mining technique that can consider the sales volume of products together was used to first select a group of products mainly sold together. Then, at the sub-category level, the types of products sold together were identified using FP-Growth. By using this sequential layer filtering technique, it may be possible to reduce the unnecessary calculations and to find practically usable rules for enhancing the effectiveness and profitability.

An Algorithm for reducing the search time of Frequent Items (빈발 항목의 탐색 시간을 단축하기 위한 알고리즘)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.147-156
    • /
    • 2011
  • With the increasing utility of the recent information system, the methods to pick up necessary products rapidly by using a lot of data has been studied. Association rule search methods to find hidden patterns has been drawing much attention, and the Apriori algorithm is a major method. However, the Apriori algorithm increases search time due to its repeated scans. This paper proposes an algorithm to reduce searching time of frequent items. The proposed algorithm creates matrix using transaction database and search for frequent items using the mean number of items of transactions at matrix and a defined minimum support. The mean number of items of transactions is used to reduce the number of transactions, and the minimum support to cut down on items. The performance of the proposed algorithm is assessed by the comparison of search time and precision with existing algorithms. The findings from this study indicated that the proposed algorithm has been searched more quickly and efficiently when extracting final frequent items, compared to existing Apriori and Matrix algorithm.