• 제목/요약/키워드: Similarity measures

검색결과 304건 처리시간 0.024초

멀티모달 기반 악성코드 유사도 계산 기법 (Multi-Modal Based Malware Similarity Estimation Method)

  • 유정도;김태규;김인성;김휘강
    • 정보보호학회논문지
    • /
    • 제29권2호
    • /
    • pp.347-363
    • /
    • 2019
  • 사람의 DNA가 변하지 않는 것과 같이 사이버상의 악성코드도 변하지 않는 고유의 행위 특징을 갖고 있다. APT(Advanced Persistent Threat) 공격에 대한 방어수단을 사전에 확보하기 위해서는 악성코드의 악성 행위 특징을 추출해야 한다. 이를 위해서는 먼저 악성코드 간의 유사도를 계산하여 유사한 악성코드끼리 분류할 수 있어야 한다. 본 논문에서는 Windows OS 상에서 동작하는 악성코드 간의 유사도 계산 방법으로 'TF-IDF 코사인 유사도', 'Nilsimsa 유사도', '악성코드 기능 유사도', 'Jaccard 유사도'를 사용해 악성코드의 유형을 예측해보고, 그 결과를 보인다. 실험결과, 유사도 계산 방식마다 악성코드 유형에 따라 예측률의 차이가 매우 컸음을 발견할 수 있었다. 모든 결과에 월등한 정확도를 보인 유사도는 존재하지 않았으나, 본 실험결과를 이용하여 특정 패밀리의 악성코드를 분류할 때 어떤 유사도 계산 방식을 활용하는 것이 상대적으로 유리할지를 결정할 때 도움이 될 것으로 판단된다.

가변적 클러스터 개수에 대한 문서군집화 평가방법 (The Evaluation Measure of Text Clustering for the Variable Number of Clusters)

  • 조태호
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2006년도 가을 학술발표논문집 Vol.33 No.2 (B)
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • 제14권6호
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

프로세스 유사성을 이용한 워크플로우 클러스터링 (Workflow Clustering Methodology Using Structural Similarity Metrics)

  • 정재윤;배준수;강석호
    • 대한산업공학회지
    • /
    • 제33권1호
    • /
    • pp.99-109
    • /
    • 2007
  • To realize process-driven management, so many companies have been launching business process managementsystems. Business process is collection of standardized and structured tasks inducing value creation of acompany. Moreover, it is recognized as one of significant intangible business assets to achieve competitiveadvantages. This research introduces a novel approach of workflow process analysis, which has more and moresignificance as process-aware information systems are spreading widely into a lot of companies, In this paper, amethodology of workflow clustering based on process similarity has been proposed. The purpose of workflowclustering is to analyze accumulated process definitions in order to assist design of new processes andimprovement of existing ones. The proposed methodology exploits measures of structural similarity of workflowprocesses.The methodology has been experimented with synthetic process models for illustrating the implicationofworkflow clustering.

Application of Similarity Measure for Fuzzy C-Means Clustering to Power System Management

  • Park, Dong-Hyuk;Ryu, Soo-Rok;Park, Hyun-Jeong;Lee, Sang-H.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권1호
    • /
    • pp.18-23
    • /
    • 2008
  • A FCM with locational price and regional information between locations are proposed in this paper. Any point in a networked system has its own values indicating the physical characteristics of that networked system and regional information at the same time. The similarity measure used for FCM in this paper is defined through the system-wide characteristic values at each point. To avoid the grouping of geometrically distant locations with similar measures, the locational information are properly considered and incorporated in the proposed similarity measure. We have verified that the proposed measure has produced proper classification of a networked system, followed by an example of a networked electricity system.

퍼지적분을 이용한 내용기반 검색 사용자 의견 반영시스템 (Relevance Feedback for Content Based Retrieval Using Fuzzy Integral)

  • Young Sik Choi
    • 인터넷정보학회논문지
    • /
    • 제1권2호
    • /
    • pp.89-96
    • /
    • 2000
  • 영상의 유사성에 대한 사용자의 주관적인지를 학습하는 방법으로 relevance feedback 기술이 사용되며, 최근 들어 이에 대한 관심이 높아지고 있다. 대부분의 relevance feedback기술은 영상 유사성을 측정하는데 사용되는 특징이 서로 독립적이라는 가정하고 있으나, 이러한 가정은 유사성 판단을 모델링 하는데 있어서 상당한 제약을 두는 것이다. 이 논문에서는. 퍼지 측정과 Choquet 적분을 이용하여, 유사성 판단에 대한 보다 나은 모델링 방법을 제안하고, 이를 이용한 relevance feedback 알고리즘을 제안한다. 실험결과를 통하여, 기존의 가중치 평균 방식에 의한 relevance feedback보다 제안된 방식이 우수함을 보인다.

  • PDF

Entropy-based Similarity Measures for Memory-based Collaborative Filtering

  • Kwon, Hyeong-Joon;Latchman, Haniph
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제5권2호
    • /
    • pp.5-10
    • /
    • 2013
  • We proposed a novel similarity measure using weighted difference entropy (WDE) to improve the performance of the CF system. The proposed similarity metric evaluates the entropy with a preference score difference between the common rated items of two users, and normalizes it based on the Gaussian, tanh and sigmoid function. We showed significant improvement of experimental results and environments. These experiments involved changing the number of nearest neighborhoods, and we presented experimental results for two data sets with different characteristics, and results for the quality of recommendation.

거리측도를 이용한 유사도의 구성과 퍼지 넘버를 이용한 유사도와의 비교연구 (Comparison Study for similarities based on Distance Measure and Fuzzy Number)

  • 이상혁
    • 한국지능시스템학회논문지
    • /
    • 제17권1호
    • /
    • pp.1-6
    • /
    • 2007
  • 거리측도를 이용한 유사도를 구성하였고 제안된 유사도의 유용성을 증명을 통하여 확인 하였다. 퍼지 넘버와 무게 중심 법을 이용한 기존의 유사도 구성에 대한 결과를 소개하였고 두 가지의 유사도를 다양한 형태의 소속 함수에 대하여 유사도 계산을 통하여 비교하였다.

Acceleration sensor, and embedded system using location-aware

  • He, Wei;Nayel, Mohamed
    • 중소기업융합학회논문지
    • /
    • 제3권1호
    • /
    • pp.23-30
    • /
    • 2013
  • 본 논문에서는 실제 값과 같은 데이터의 불확실성과 유사성을 측정 할 수 있는 퍼지 엔트로피와 유사성 측정이 소개되고 있다. 퍼지 엔트로피와 유사성 측정의 디자인이 설명하고 입증했다. 획득 수단은 연산 프로세스에 적용되고 논의되었다. 이러한 의사 결정과 퍼지 게임 이론과 같은 데이터 정량화 결과의 연장도 논의되었다.

  • PDF

Improving Performance of Jaccard Coefficient for Collaborative Filtering

  • Lee, Soojung
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권11호
    • /
    • pp.121-126
    • /
    • 2016
  • In recommender systems based on collaborative filtering, measuring similarity is very critical for determining the range of recommenders. Data sparsity problem is fundamental in collaborative filtering systems, which is partly solved by Jaccard coefficient combined with traditional similarity measures. This study proposes a new coefficient for improving performance of Jaccard coefficient by compensating for its drawbacks. We conducted experiments using datasets of various characteristics for performance analysis. As a result of comparison between the proposed and the similarity metric of Pearson correlation widely used up to date, it is found that the two metrics yielded competitive performance on a dense dataset while the proposed showed much better performance on a sparser dataset. Also, the result of comparing the proposed with Jaccard coefficient showed that the proposed yielded far better performance as the dataset is denser. Overall, the proposed coefficient demonstrated the best prediction and recommendation performance among the experimented metrics.