• Title/Summary/Keyword: 확률적 데이터 연관

Search Result 58, Processing Time 0.026 seconds

Exploration of PIM based similarity measures as association rule thresholds (확률적 흥미도를 이용한 유사성 측도의 연관성 평가 기준)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1127-1135
    • /
    • 2012
  • Association rule mining is the method to quantify the relationship between each set of items in a large database. One of the well-studied problems in data mining is exploration for association rules. There are three primary quality measures for association rule, support and confidence and lift. We generate some association rules using confidence. Confidence is the most important measure of these measures, but it is an asymmetric measure and has only positive value. Thus we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure to find a solution to this problem. The comparative studies with support, two confidences, lift, and some similarity measures by probabilistic interestingness measure are shown by numerical example. As the result, we knew that the similarity measures by probabilistic interestingness measure could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values.

Proposition of causally confirmed measures in association rule mining (인과적 확인 측도에 의한 연관성 규칙 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.857-868
    • /
    • 2014
  • Data mining is the representative analysis methodology in the era of big data, and is the process to analyze a massive volume database and summarize it into meaningful information. Association rule technique finds the relationship among several items in huge database using the interestingness measures such as support, confidence, lift, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. Moreover, we can not know association direction by them. This paper propose causally confirmed association thresholds to compensate for these problems, and then check the three conditions of interestingness measures. The comparative studies with basic association thresholds, causal association thresholds, and causally confirmed association thresholds are shown by simulation studies. The results show that causally confirmed association thresholds are better than basic and causal association thresholds.

Experimental Research on Radar and ESM Measurement Fusion Technique Using Probabilistic Data Association for Cooperative Target Tracking (협동 표적 추적을 위한 확률적 데이터 연관 기반 레이더 및 ESM 센서 측정치 융합 기법의 실험적 연구)

  • Lee, Sae-Woom;Kim, Eun-Chan;Jung, Hyo-Young;Kim, Gi-Sung;Kim, Ki-Seon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.5C
    • /
    • pp.355-364
    • /
    • 2012
  • Target processing mechanisms are necessary to collect target information, real-time data fusion, and tactical environment recognition for cooperative engagement ability. Among these mechanisms, the target tracking starts from predicting state of speed, acceleration, and location by using sensors' measurements. However, it can be a problem to give the reliability because the measurements have a certain uncertainty. Thus, a technique which uses multiple sensors is needed to detect the target and increase the reliability. Also, data fusion technique is necessary to process the data which is provided from heterogeneous sensors for target tracking. In this paper, a target tracking algorithm is proposed based on probabilistic data association(PDA) by fusing radar and ESM sensor measurements. The radar sensor's azimuth and range measurements and the ESM sensor's bearing-only measurement are associated by the measurement fusion method. After gating associated measurements, state estimation of the target is performed by PDA filter. The simulation results show that the proposed algorithm provides improved estimation under linear and circular target motions.

Multi-channel Video Analysis Based on Deep Learning for Video Surveillance (보안 감시를 위한 심층학습 기반 다채널 영상 분석)

  • Park, Jang-Sik;Wiranegara, Marshall;Son, Geum-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1263-1268
    • /
    • 2018
  • In this paper, a video analysis is proposed to implement video surveillance system with deep learning object detection and probabilistic data association filter for tracking multiple objects, and suggests its implementation using GPU. The proposed video analysis technique involves object detection and object tracking sequentially. The deep learning network architecture uses ResNet for object detection and applies probabilistic data association filter for multiple objects tracking. The proposed video analysis technique can be used to detect intruders illegally trespassing any restricted area or to count the number of people entering a specified area. As a results of simulations and experiments, 48 channels of videos can be analyzed at a speed of about 27 fps and real-time video analysis is possible through RTSP protocol.

Utilization of similarity measures by PIM with AMP as association rule thresholds (모든 주변 비율을 고려한 확률적 흥미도 측도 기반 유사성 측도의 연관성 평가 기준 활용 방안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.117-124
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relationship between a set of items in a huge database, andhas been applied in various fields like internet shopping mall, healthcare, insurance, and education. There are three primary interestingness measures for association rule, support and confidence and lift. Confidence is the most important measure of these measures, and we generate some association rules using confidence. But it is an asymmetric measure and has only positive value. So we can face with difficult problems in generation of association rules. In this paper we apply the similarity measures by probabilistic interestingness measure (PIM) with all marginal proportions (AMP) to solve this problem. The comparative studies with support, confidences, lift, chi-square statistics, and some similarity measures by PIM with AMPare shown by numerical example. As the result, we knew that the similarity measures by PIM with AMP could be seen the degree of association same as confidence. And we could confirm the direction of association because they had the sign of their values, and select the best similarity measure by PIM with AMP.

Proposition of causal association rule thresholds (인과적 연관성 규칙 평가 기준의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1189-1197
    • /
    • 2013
  • Data mining is the process of analyzing a huge database from different perspectives and summarizing it into useful information. One of the well-studied problems in data mining is association rule generation. Association rule mining finds the relationship among several items in massive volume database using the interestingness measures such as support, confidence, lift, etc. Typical applications for this technique include retail market basket analysis, item recommendation systems, cross-selling, customer relationship management, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. This paper propose causal association thresholds to compensate for this problem, and then check the three conditions of interestingness measures. The comparative studies with basic and causal association thresholds are shown by numerical example. The results show that causal association thresholds are better than basic association thresholds.

Association rule ranking function using conditional probability increment ratio (조건부 확률증분비를 이용한 연관성 순위 결정 함수)

  • Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.4
    • /
    • pp.709-717
    • /
    • 2010
  • The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary measures for association rule, support and confidence and lift. In this paper we developed a association rule ranking function using conditional probability increment ratio. We compared our function with several association rule ranking functions by some numerical examples. As the result, we knew that our decision function was better than the existing functions. The reasons were that the proposed function of the reference value is not affected by a particular association threshold, and our function had a value between -1 and 1 regardless of the range for three association thresholds. And we knew that the ranking function using conditional probability increment ratio was very well reflected in the difference between association rule measures and the minimum association rule thresholds, respectively.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

초기 투자성과와 연계된 창업기업의 내부 결정요인 분석

  • Gu, In-Hyeok;Kim, Yong-Deok;Jo, Jae-Min
    • 한국벤처창업학회:학술대회논문집
    • /
    • 2022.11a
    • /
    • pp.195-199
    • /
    • 2022
  • 본 연구는 스타트업 투자성과와 연계된 창업가 역량, 재무정보 등 정량데이터를 기반으로 스타트업 투자자들의 투자결정요인을 분석하였다. 주요 실증결과는 다음과 같다. 첫째, 창업 초기 종업원 수가 많고, 최고경영자의 지분비율이 높을수록 투자승인 확률이 높게 나타났다. 둘째, 재무적 특성에서는 기업의 매출액 규모가 작을수록 투자승인 확률이 높게 나타났다. 이러한 결과는 스타트업의 경우, 기업의 단기성과보다는 투자 결정에 있어 미래가치 혹은 다른 정성적인 요인이 더 고려된다는 점을 보여준다. 셋째, 창업가 특성에서는 CEO 교육수준(학력)이 높을수록 투자승인 확률이 높게 나타났다. 즉, 창업자의 학력은 성공적인 투자 여부에 핵심적인 변수이고 이것은 선행연구의 결과와 다르지 않았다. 또한, 학력과 투자유치의 연관성은 창업 3년 미만 스타트업에서 상대적으로 강하게 나타났다. 선행연구에서 투자정보공개, 관련 데이터 확보의 어려움으로 스타트업 투자에 관한 정량적 실증연구가 거의 진행되지 못한 점을 고려한다면, 본 연구는 설문조사 방식을 뛰어넘어 국내 초기 창업기업만을 대상으로 성공적인 투자유치와 연계된 스타트업의 평가요인을 정량적으로 분석했다는 점에서 중요한 의의를 갖는다.

  • PDF

실시간 CRM을 위한 분류 기법과 연관성 규칙의 통합적 활용;신용카드 고객 이탈 예측에 활용

  • Lee, Ji-Yeong;Kim, Jong-U
    • 한국경영정보학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.135-140
    • /
    • 2007
  • 이탈 고객 예측은 데이터 마이닝에서 다루는 주요한 문제 중에 하나이다. 이탈 고객 예측은 일종의 분류(classification) 문제로 의사결정나무추론, 로지스틱 회귀분석, 인공신경망 등의 기법이 많이 활용되어왔다. 일반적으로 이탈 고객 예측을 위한 모델은 고객의 인구통계학적 정보와 계약이나 거래 정보를 입력변수로 하여 이탈 여부를 목표변수로 보는 형태로 분류 모델을 생성하게 된다. 본 연구에서는 고객과의 지속적인 접촉으로 발생되는 추가적인 사건 정보를 활용하여 연관성 규칙을 생성하고 이 결과를 기존의 방식으로 생성된 분류 모델과 결합하는 이탈 고객 예측 방법을 제시한다. 제시한 방법의 유용성을 확인하기 위해서 특정 국내 신용카드사의 실제 데이터를 활용하여 실험을 수행하였다. 실험 결과 제시된 방법이 기존의 전통적인 분류 모델에 비해서 향상된 성능을 보이는 것을 확인할 수 있었다. 제시된 예측 방법의 장점은 기존의 이탈 예측을 위한 입력 변수들 이외에 고객과 회사간의 접촉을 통해서 생성된 동적 정보들을 통합적으로 활용하여 예측 정확도를 높이고 실시간으로 이탈 확률을 갱신할 수 있다는 점이다.

  • PDF