• Title/Summary/Keyword: 그룹모델 클러스터링

Search Result 41, Processing Time 0.021 seconds

Aggregation Techniques for Alert Data of Intrusion Detection System using Data Mining (데이터마이닝을 이용한 침입 탐지 시스템의 경보데이터 축약기법)

  • Hu, Moon-Heang
    • Proceedings of the KAIS Fall Conference
    • /
    • 2009.05a
    • /
    • pp.764-767
    • /
    • 2009
  • 이 논문에서는 데이터마이닝의 클러스터링을 이용한 경보 데이터 축약기법을 제안한다. 제안된 클러스터링 기반 경보데이터 축약기법은 데이터간의 유사성을 이용한 경보 데이터의 그룹화를 통해 생성된 모델을 이용하여 새로운 경보 데이터에 대한 분류를 자동화할 수 있다. 이것은 과거에 탐지된 공격의 형태뿐만 아니라 새로운 혹은 변형된 경보의 분류나 분석에도 이용할 수 있다. 또한 생성된 클러스터의 생성 원인의 분석을 이용한 클러스터 간의 시퀀스의 추출을 통해 사용자가 공격의 순차적인 구조나 그 이면에 감추어진 전략을 이해하는데 도움을 주며, 현재의 경보 이후에 발생 가능한 경보들을 예측할 수 있다.

  • PDF

A Study on Web-User Clustering Algorithm for Web Personalization (웹 개인화를 위한 웹사용자 클러스터링 알고리즘에 관한 연구)

  • Lee, Hae-Kag
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2375-2382
    • /
    • 2011
  • The user clustering for web navigation pattern discovery is very useful to get preference and behavior pattern of users for web pages. In addition, the information by the user clustering is very essential for web personalization or customer grouping. In this paper, an algorithm for clustering the web navigation path of users is proposed and then some special navigation patterns can be recognized by the algorithm. The proposed algorithm has two clustering phases. In the first phase, all paths are classified into k-groups on the bases of the their similarities. The initial solution obtained in the first phase is not global optimum but it gives a good and feasible initial solution for the second phase. In the second phase, the first phase solution is improved by revising the k-means algorithm. In the revised K-means algorithm, grouping the paths is performed by the hyperplane instead of the distance between a path and a group center. Experimental results show that the proposed method is more efficient.

A Load Balanced Clustering Model for Energy Efficient Packet Transmission in Wireless Sensor Networks (무선 센서 네트워크에서 에너지 효율적 패킷 전송을 위한 부하 균형 클러스터링 모델)

  • Lee, Jae-Hee;Kim, Byung-Ki;Kang, Seong-Ho
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.12
    • /
    • pp.409-414
    • /
    • 2015
  • The energy conservation is the most important subject for long run operation of the sensor nodes with limited power resources. Clustering is one of most energy efficient technique to grouped the sensor nodes into distinct cluster. But, in a cluster based WSN, CHs and gateways bear an extra work load to send the processed data to the sink. The inappropriate cluster formation may cause gateways overloaded and may increase latency in communication. In this paper, we propose a novel load balanced clustering model for improving energy efficiency and giving a guarantee of long network lifetime. We show the result of performance measurement experiments that designs using a branch and bound algorithm and a multi-start local search algorithm to compare with the existing load balanced clustering model.

Similarity Measurement with Interestingness Weight for Improving the Accuracy of Web Transaction Clustering (웹 트랜잭션 클러스터링의 정확성을 높이기 위한 흥미가중치 적용 유사도 비교방법)

  • Kang, Tae-Ho;Min, Young-Soo;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.717-730
    • /
    • 2004
  • Recently. many researches on the personalization of a web-site have been actively made. The web personalization predicts the sets of the most interesting URLs for each user through data mining approaches such as clustering techniques. Most existing methods using clustering techniques represented the web transactions as bit vectors that represent whether users visit a certain WRL or not to cluster web transactions. The similarity of the web transactions was decided according to the match degree of bit vectors. However, since the existing methods consider only whether users visit a certain URL or not, users' interestingness on the URL is excluded from clustering web transactions. That is, it is possible that the web transactions with different visit proposes or inclinations are classified into the same group. In this paper. we propose an enhanced transaction modeling with interestingness weight to solve such problems and a new similarity measuring method that exploits the proposed transaction modeling. It is shown through performance evaluation that our similarity measuring method improves the accuracy of the web transaction clustering over the existing method.

Adaptive Data Mining Model using Fuzzy Performance Measures (퍼지 성능 측정자를 이용한 적응 데이터 마이닝 모델)

  • Rhee, Hyun-Sook
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.541-546
    • /
    • 2006
  • Data Mining is the process of finding hidden patterns inside a large data set. Cluster analysis has been used as a popular technique for data mining. It is a fundamental process of data analysis and it has been Playing an important role in solving many problems in pattern recognition and image processing. If fuzzy cluster analysis is to make a significant contribution to engineering applications, much more attention must be paid to fundamental decision on the number of clusters in data. It is related to cluster validity problem which is how well it has identified the structure that Is present in the data. In this paper, we design an adaptive data mining model using fuzzy performance measures. It discovers clusters through an unsupervised neural network model based on a fuzzy objective function and evaluates clustering results by a fuzzy performance measure. We also present the experimental results on newsgroup data. They show that the proposed model can be used as a document classifier.

Alert Correlation Analysis based on Clustering Technique for IDS (클러스터링 기법을 이용한 침입 탐지 시스템의 경보 데이터 상관관계 분석)

  • Shin, Moon-Sun;Moon, Ho-Sung;Ryu, Keun-Ho;Jang, Jong-Su
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.665-674
    • /
    • 2003
  • In this paper, we propose an approach to correlate alerts using a clustering analysis of data mining techniques in order to support intrusion detection system. Intrusion detection techniques are still far from perfect. Current intrusion detection systems cannot fully detect novel attacks. However, intrucsion detection techniques are still far from perfect. Current intrusion detection systems cannot fully detect novel attacks or variations of known attacks without generating a large amount of false alerts. In addition, all the current intrusion detection systems focus on low-level attacks or anomalies. Consequently, the intrusion detection systems to underatand the intrusion behind the alerts and take appropriate actions. The clustering analysis groups data objects into clusters such that objects belonging to the same cluster are similar, while those belonging to different ones are dissimilar. As using clustering technique, we can analyze alert data efficiently and extract high-level knowledgy about attacks. Namely, it is possible to classify new type of alert as well as existed. And it helps to understand logical steps and strategies behind series of attacks using sequences of clusters, and can potentially be applied to predict attacks in progress.

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.

Genetic Clustering with Semantic Vector Expansion (의미 벡터 확장을 통한 유전자 클러스터링)

  • Song, Wei;Park, Soon-Cheol
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.3
    • /
    • pp.1-8
    • /
    • 2009
  • This paper proposes a new document clustering system using fuzzy logic-based genetic algorithm (GA) and semantic vector expansion technology. It has been known in many GA papers that the success depends on two factors, the diversity of the population and the capability to convergence. We use the fuzzy logic-based operators to adaptively adjust the influence between these two factors. In traditional document clustering, the most popular and straightforward approach to represent the document is vector space model (VSM). However, this approach not only leads to a high dimensional feature space, but also ignores the semantic relationships between some important words, which would affect the accuracy of clustering. In this paper we use latent semantic analysis (LSA)to expand the documents to corresponding semantic vectors conceptually, rather than the individual terms. Meanwhile, the sizes of the vectors can be reduced drastically. We test our clustering algorithm on 20 news groups and Reuter collection data sets. The results show that our method outperforms the conventional GA in various document representation environments.

Application of Hidden Markov Model to Intrusion Detection System (침입탐지 시스템을 위한 은닉 마르코프 모델의 적용)

  • Choe, Jong-Ho;Jo, Seong-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.6
    • /
    • pp.429-438
    • /
    • 2001
  • 정보통신 구조의 확산과 함께 전산시스템에 대한 침입과 피해가 증가되고 있으며 침입탐지 시스템에 대한 관심과 연구가 늘어나고 있다. 본 논문에서는 은닉 마르코프 모델(HMM)을 이용하여 사용자의 정상행위에서 생성된 이벤트ID 정보를 모델링한 후 사용자의 비정상행위를 탐지하는 침입탐지 시스템을 제안한다. 전처리를 거친 이벤트ID열은 전방향-역방향 절차와 Baum-Welch 재추정식을 이용하여 정상행위로 구축된다. 판정은 전방향 절차를 이용해서 판정하려는 열이 정상행위로부터 생성되었을 확률을 계산하며, 이 값을 임계값과 비교함으로써 수행된다. 실험을 통해 침입탐지를 위한 최적의 HMM 매개변수를 결정하고 사용자 구분이 없는 단일모델링, 사용자별 모델링, 사용자 그룹별 모델링 방식을 비교하여 정상행위 모델링 성능을 평가하였다. 실험결과 제안한 시스템이 발생한 침입을 적절히 탐지함을 확인할 수 있었지만, 신뢰도 높은 침입탐지 시스템의 구축을 위해서는 보다 정교한 모델의 클러스터링이 필요함을 알 수 있었다.

  • PDF