DOI QR코드

DOI QR Code

통계적 수량화 방법을 이용한 효과적인 네트워크 데이터 비교 방법

Effective and Statistical Quantification Model for Network Data Comparing

  • 조재익 (고려대학교 정보보호센터) ;
  • 김호인 (고려대학교 정보보호센터) ;
  • 문종섭 (고려대학교 정보보호센터)
  • 발행 : 2008.01.30

초록

네트워크 데이터 분석에 있어서 추정모델이 얼마나 모집단을 대표하느냐는 반드시 연구되어야 한다. 본 논문에서는 네트워크 데이터의 각 추출 가능한 표준 정보를 이용하여 현재 공개되어 사용하고 있는 MIT Lincoln Lab의 네트워크 데이터와 모델링 된 KDD CUP 99 데이터를 비교 분석한다. 비교, 분석에 있어서 두 데이터에 공통으로 포함되고 표준 정보인 프로토콜 정보를 이용하여 분석한다. 분석은 통계적 분석 방법인 대응 분석 방법을 이용하여 분석하고, SVD를 이용해 2차원 공간에 표현하며, 가중 유클리드 거리를 이용해 네트워크 데이터를 수량화하였다.

In the field of network data analysis, the research of how much the estimation data reflects the population data is inevitable. This paper compares and analyzes the well known MIT Lincoln Lab network data, which is composed of collectable standard information from the network with the KDD CUP 99 dataset which was composed from the MIT/LL data. For comparison and analysis, the protocol information of both the data was used. Correspondence analysis was used for analysis, SVD was used for 2 dimensional visualization and weigthed euclidean distance was used for network data quantification.

키워드

참고문헌

  1. Goodman, L.A. Simple models for the analysis of association in cross-classifications having ordered categories. J. Am. Statist. Assoc. 74, 537-552. 1979 https://doi.org/10.2307/2286971
  2. Goodman, L.A. Association models and canonical correlation in the analysis of cross-classifications having ordered categories. J. Am. Statist. Assoc. 76, 320-334. 1981 https://doi.org/10.2307/2287833
  3. Goodman, L.A. The analysis of cross-classified data having ordered and/or unordered categories: Association models, correlation models, and asymmetry models for contingency tables with or without missing entries. Ann. Statist. 13, 10-69. 1985 https://doi.org/10.1214/aos/1176346576
  4. Goodman, L.A. Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables (with discussion). Int. Statist. Rev. 54, 243-309. 1986 https://doi.org/10.2307/1403053
  5. MH Huh. Correspondence Analysis of Two-way Contingency Tables with Ordered Column Categories. International Statistical Institute. Vol. 52. Pp59-60. 1999
  6. James Lattin, J. Douglas Carroll, Paul E. Green. Analyzing Multivariate Data. Thomson. Pp318, 2003
  7. Alan Agresti. Categorical Data Analysis. Pp382. Wiley. 2002
  8. J. W. Haines. 1999 DARPA Intrusion Detection Evaluation. Technical Report 1062. MIT Lincoln Laboratory. 2001
  9. Saharon Rosset, Aron Inger. KDD-cup 99. ACM SIGKDD Explorations Newsletter. KDD-99 Conference report. 2000 https://doi.org/10.1145/846183.846204
  10. James Lattin, J. Douglas Carroll, Paul E. Green. Analyzing Multivariate Data. Thomson. pp25, 2003