• Title/Summary/Keyword: 규칙 생성과 평가

Search Result 195, Processing Time 0.025 seconds

Non-linear regression model considering all association thresholds for decision of association rule numbers (기본적인 연관평가기준 전부를 고려한 비선형 회귀모형에 의한 연관성 규칙 수의 결정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.267-275
    • /
    • 2013
  • Among data mining techniques, the association rule is the most recently developed technique, and it finds the relevance between two items in a large database. And it is directly applied in the field because it clearly quantifies the relationship between two or more items. When we determine whether an association rule is meaningful, we utilize interestingness measures such as support, confidence, and lift. Interestingness measures are meaningful in that it shows the causes for pruning uninteresting rules statistically or logically. But the criteria of these measures are chosen by experiences, and the number of useful rules is hard to estimate. If too many rules are generated, we cannot effectively extract the useful rules.In this paper, we designed a variety of non-linear regression equations considering all association thresholds between the number of rules and three interestingness measures. And then we diagnosed multi-collinearity and autocorrelation problems, and used analysis of variance results and adjusted coefficients of determination for the best model through numerical experiments.

Automatic Generation of Concatenate Morphemes for Korean LVCSR (대어휘 연속음성 인식을 위한 결합형태소 자동생성)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.407-414
    • /
    • 2002
  • In this paper, we present a method that automatically generates concatenate morpheme based language models to improve the performance of Korean large vocabulary continuous speech recognition. The focus was brought into improvement against recognition errors of monosyllable morphemes that occupy 54% of the training text corpus and more frequently mis-recognized. Knowledge-based method using POS patterns has disadvantages such as the difficulty in making rules and producing many low frequency concatenate morphemes. Proposed method automatically selects morpheme-pairs from training text data based on measures such as frequency, mutual information, and unigram log likelihood. Experiment was performed using 7M-morpheme text corpus and 20K-morpheme lexicon. The frequency measure with constraint on the number of morphemes used for concatenation produces the best result of reducing monosyllables from 54% to 30%, bigram perplexity from 117.9 to 97.3. and MER from 21.3% to 17.6%.

A Rule-driven Automatic Learner Grouping System Supporting Various Class Types (다양한 수업 유형을 지원하는 규칙 기반 학습자 자동 그룹핑 시스템)

  • Kim, Eun-Hee;Park, Jong-Hyun;Kang, Ji-Hoon
    • Journal of The Korean Association of Information Education
    • /
    • v.14 no.3
    • /
    • pp.291-300
    • /
    • 2010
  • Group-based learning is known to be an effective means to improve scholastic achievement in online learning. Therefore, there are some previous researches for the group-based learning. A lot of previous researches define factors for grouping from the characteristics of classes, teacher's decision and students' preferences and then generate a group based on the defined factors. However, many algorithms proposed by previous researches depend on a specific class and is not a general approach since there exist several differences in terms of the need of courses, learners, and teachers. Moreover it is hard to find a automatic system for group generation. This paper proposes a grouping system which automatically generate a learner group according to characteristics of various classes. the proposed system automatically generates a learner group by using basic information for a class or additional factors inputted from a user. The proposed system defines a set of rules for learner grouping which enables automatic selection of a learner grouping algorithm tailored to the characteristics of a given class. This rule based approach allows the proposed system to accommodate various learner grouping algorithms for a later use. Also we show the usability of our system by serviceability evaluation.

  • PDF

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

Automatic Generation of Pronunciation Variants for Korean Continuous Speech Recognition (한국어 연속음성 인식을 위한 발음열 자동 생성)

  • 이경님;전재훈;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.35-43
    • /
    • 2001
  • Many speech recognition systems have used pronunciation lexicon with possible multiple phonetic transcriptions for each word. The pronunciation lexicon is of often manually created. This process requires a lot of time and efforts, and furthermore, it is very difficult to maintain consistency of lexicon. To handle these problems, we present a model based on morphophon-ological analysis for automatically generating Korean pronunciation variants. By analyzing phonological variations frequently found in spoken Korean, we have derived about 700 phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In generating pronunciation variants, morphological analysis is preceded to handle variations of phonological words. According to the morphological category, a set of tables reflecting phonemic context is looked up to generate pronunciation variants. Our experiments show that the proposed model produces mostly correct pronunciation variants of phonological words. Then we estimated how useful the pronunciation lexicon and training phonetic transcription using this proposed systems.

  • PDF

Comparison of confidence measures useful for classification model building (분류 모형 구축에 유용한 신뢰도 측도 간의 비교)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.365-371
    • /
    • 2014
  • Association rule of the well-studied techniques in data mining is the exploratory data analysis for understanding the relevance among the items in a huge database. This method has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, similarity measures, etc. By typical association rule technique, we generate association rule that satisfy minimum support and confidence values. Support and confidence are the most frequently used, but they have the drawback that they can not determine the direction of the association because they have always positive values. In this paper, we compared support, basic confidence, and three kinds of confidence measures useful for classification model building to overcome this problem. The result confirmed that the causal confirmed confidence was the best confidence in view of the association mining because it showed more precisely the direction of association.

Fuzzy Inference Systems Based on FCM Clustering Algorithm for Nonlinear Process (비선형 공정을 위한 FCM 클러스터링 알고리즘 기반 퍼지 추론 시스템)

  • Park, Keon-Jun;Kang, Hyung-Kil;Kim, Yong-Kab
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.5 no.4
    • /
    • pp.224-231
    • /
    • 2012
  • In this paper, we introduce a fuzzy inference systems based on fuzzy c-means clustering algorithm for fuzzy modeling of nonlinear process. Typically, the generation of fuzzy rules for nonlinear processes have the problem that the number of fuzzy rules exponentially increases. To solve this problem, the fuzzy rules of fuzzy model are generated by partitioning the input space in the scatter form using FCM clustering algorithm. The premise parameters of the fuzzy rules are determined by membership matrix by means of FCM clustering algorithm. The consequence part of the rules is expressed in the form of polynomial functions and the coefficient parameters of each rule are determined by the standard least-squares method. And lastly, we evaluate the performance and the nonlinear characteristics using the data widely used in nonlinear process.

The Design and Implementation of Korean Text-to-Speech Conversion System on a Rule-Based Framework (한국어(韓國語) 규칙(規則) 음성(音聲) 합성(合成) 시스템의 구현(具現))

  • Son, Yung-Taek;Kim, Yong-Kap;Matsumoto, Tatsuro
    • Annual Conference on Human and Language Technology
    • /
    • 1993.10a
    • /
    • pp.141-148
    • /
    • 1993
  • 본고는, 한글 한자가 혼용된 입력 텍스트를 음성으로 변환 출력하는 포르만트 음성 합성 방식 즉, 한국어 규칙 음성 합성(이하에는 KTTS[Korean Text To Speech System]이라고 함)의 전반적인 처리 흐름에 대하여 소개한다. 특히, 입력 텍스트에 있어서, 한자 또는 각종 부호의 한글 변환 기능, 음성 출력용 문법 정보 추출에 필요한 입력문의 해석 및 구문경계 설정 기능, 또한 음소 기호 변환 및 파라메터 값 생성과 변경 처리기능을 중심으로 설명하고자 한다. 또한 본 시스템의 완성과 더불어 실시하였던 청취 실험 평가 결과에 대하여 덧붙이겠다.

  • PDF

Host-based intrusion detection research using CNN and Kibana (CNN과 Kibana를 활용한 호스트 기반 침입 탐지 연구)

  • Park, DaeKyeong;Shin, Dongkyoo;Shin, Dongil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.920-923
    • /
    • 2020
  • 사이버 공격이 더욱 지능화됨에 따라 기존의 침입 탐지 시스템(Intrusion Detection System)은 기존의 저장된 패턴에서 벗어난 지능형 공격을 탐지하기에 적절하지 않다. 딥러닝(Deep Learning) 기반 침입 탐지는 새로운 탐지 규칙을 생성하는데 적절하다. 그 이유는 딥러닝은 데이터 학습을 통해 새로운 침입 규칙을 자체적으로 생성하기 때문이다. 침입 탐지 시스템 데이터 세트는 가장 널리 사용되는 KDD99 데이터와 LID-DS(Leipzig Intrusion Detection-Data Set)를 사용했다. 본 논문에서는 1차원 벡터를 이미지로 변환하고 CNN(Convolutional Neural Network)을 적용하여 두 데이터 세트에 대한 성능을 실험했다. 평가를 위해 Accuracy, Precision, Recall 및 F1-Score 지표를 측정했다. 그 결과 LID-DS 데이터 세트의 Accuracy가 KDD99 데이터 세트의 Accuracy 보다 약 8% 높은 것을 확인했다. 또한, 1차원 벡터에 대한 데이터를 Kibana를 사용하여 데이터를 시각화하여 대용량 데이터를 한눈에 보기 어려운 단점을 해결하는 방법을 제안한다.

Personalized Recommendation System using FP-tree Mining based on RFM (RFM기반 FP-tree 마이닝을 이용한 개인화 추천시스템)

  • Cho, Young-Sung;Ho, Ryu-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.197-206
    • /
    • 2012
  • A exisiting recommedation system using association rules has the problem, such as delay of processing speed from a cause of frequent scanning a large data, scalability and accuracy as well. In this paper, using a Implicit method which is not used user's profile for rating, we propose the personalized recommendation system which is a new method using the FP-tree mining based on RFM. It is necessary for us to keep the analysis of RFM method and FP-tree mining to be able to reflect attributes of customers and items based on the whole customers' data and purchased data in order to find the items with high purchasability. The proposed makes frequent items and creates association rule by using the FP-tree mining based on RFM without occurrence of candidate set. We can recommend the items with efficiency, are used to generate the recommendable item according to the basic threshold for association rules with support, confidence and lift. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.