• Title/Summary/Keyword: Cosine Similarity Function

Search Result 10, Processing Time 0.031 seconds

An Extended Work Architecture for Online Threat Prediction in Tweeter Dataset

  • Sheoran, Savita Kumari;Yadav, Partibha
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.97-106
    • /
    • 2021
  • Social networking platforms have become a smart way for people to interact and meet on internet. It provides a way to keep in touch with friends, families, colleagues, business partners, and many more. Among the various social networking sites, Twitter is one of the fastest-growing sites where users can read the news, share ideas, discuss issues etc. Due to its vast popularity, the accounts of legitimate users are vulnerable to the large number of threats. Spam and Malware are some of the most affecting threats found on Twitter. Therefore, in order to enjoy seamless services it is required to secure Twitter against malicious users by fixing them in advance. Various researches have used many Machine Learning (ML) based approaches to detect spammers on Twitter. This research aims to devise a secure system based on Hybrid Similarity Cosine and Soft Cosine measured in combination with Genetic Algorithm (GA) and Artificial Neural Network (ANN) to secure Twitter network against spammers. The similarity among tweets is determined using Cosine with Soft Cosine which has been applied on the Twitter dataset. GA has been utilized to enhance training with minimum training error by selecting the best suitable features according to the designed fitness function. The tweets have been classified as spammer and non-spammer based on ANN structure along with the voting rule. The True Positive Rate (TPR), False Positive Rate (FPR) and Classification Accuracy are considered as the evaluation parameter to evaluate the performance of system designed in this research. The simulation results reveals that our proposed model outperform the existing state-of-arts.

Quantitative Measure of the Changes of Migration Patterns Using Cosine Similarity (코사인 유사도를 이용한 이주패턴 변화의 정량적 측정)

  • Han, Yicheol
    • Journal of Korean Society of Rural Planning
    • /
    • v.23 no.2
    • /
    • pp.67-74
    • /
    • 2017
  • Migration is defined as the movement of people between residential places, and represents interactions between regions. Changes in migration involve changes in both the number of migrants toward/from regions and migration patterns across regions. However, most migration studies have focused only on the change in migrants, while no empirical study captures changes in migration patterns. In this paper, I present a function using the cosine similarity to measure changes in migration patterns, and apply it to 2001-2016 migration data of Korea. The results show that the migration patterns of Korea shifted in 2007, resulting in two distinct clusters. Local areas experienced various migration pattern changes despite few changes in the number of migrants.

Measures of Abnormal User Activities in Online Comments Based on Cosine Similarity (코사인 유사도 기반의 인터넷 댓글 상 이상 행위 분석 방법)

  • Kim, Minjae;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.2
    • /
    • pp.335-343
    • /
    • 2014
  • It is more important to ensure the credibility of internet media which influence the public opinion. However, there are vague suspicions in public from the examples of manipulation of online reviews with anonymity. In this study, we explore the possibility of manipulating public opinion in online web sites. We investigate the characteristics of comments posted by users on web sites and compare each comments by using the cosine similarity function. Our result shows followings. First, we found a correlation between the similarities of comments and the article ranks in the web sites. Second, it is possible to identify abnormal user activities indicating excessive multiple posting, double posting and astroturf activities.

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

Gaussian Noise Reduction Technique using Improved Kernel Function based on Non-Local Means Filter (비지역적 평균 필터 기반의 개선된 커널 함수를 이용한 가우시안 잡음 제거 기법)

  • Lin, Yueqi;Choi, Hyunho;Jeong, Jechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.73-76
    • /
    • 2018
  • A Gaussian noise is caused by surrounding environment or channel interference when transmitting image. The noise reduces not only image quality degradation but also high-level image processing performance. The Non-Local Means (NLM) filter finds similarity in the neighboring sets of pixels to remove noise and assigns weights according to similarity. The weighted average is calculated based on the weight. The NLM filter method shows low noise cancellation performance and high complexity in the process of finding the similarity using weight allocation and neighbor set. In order to solve these problems, we propose an algorithm that shows an excellent noise reduction performance by using Summed Square Image (SSI) to reduce the complexity and applying the weighting function based on a cosine Gaussian kernel function. Experimental results demonstrate the effectiveness of the proposed algorithm.

  • PDF

Performance Improvement of Deep Clustering Networks for Multi Dimensional Data (다차원 데이터에 대한 심층 군집 네트워크의 성능향상 방법)

  • Lee, Hyunjin
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.8
    • /
    • pp.952-959
    • /
    • 2018
  • Clustering is one of the most fundamental algorithms in machine learning. The performance of clustering is affected by the distribution of data, and when there are more data or more dimensions, the performance is degraded. For this reason, we use a stacked auto encoder, one of the deep learning algorithms, to reduce the dimension of data which generate a feature vector that best represents the input data. We use k-means, which is a famous algorithm, as a clustering. Sine the feature vector which reduced dimensions are also multi dimensional, we use the Euclidean distance as well as the cosine similarity to increase the performance which calculating the similarity between the center of the cluster and the data as a vector. A deep clustering networks combining a stacked auto encoder and k-means re-trains the networks when the k-means result changes. When re-training the networks, the loss function of the stacked auto encoder and the loss function of the k-means are combined to improve the performance and the stability of the network. Experiments of benchmark image ad document dataset empirically validated the power of the proposed algorithm.

Clustering-based Statistical Machine Translation Using Syntactic Structure and Word Similarity (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Han-Kyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • Clustering method which based on sentence type or document genre is a technique used to improve translation quality of SMT(statistical machine translation) by domain-specific translation. But there is no previous research using sentence type and document genre information simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Kernel function and cosine measures are applied to calculate structural similarity and word similarity. With these similarities, we used machine learning algorithms similar to K-means to clustering. In Japanese-English patent translation corpus, we got 2.5% point relative improvements of translation quality at optimal case.

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

Redundant and Abnormal Data Processing Scheme in Large-scale IoT Environment (대규모 IoT 환경에서의 중복 및 비정상 데이터 처리 기법)

  • Kim, Min-Woo;Lee, Tae-Ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.109-110
    • /
    • 2019
  • 최근 IoT 환경에서는 고밀도로 노드가 분포되어진다. 이러한 센서 노드들은 데이터 전송 시 혼잡을 초래하는 중복 데이터를 생성하여 데이터의 정확도를 저하시킨다. 이에 따라 본 연구에서는 데이터 집중으로 인해 발생하는 네트워크의 정체 문제를 해결하기 위해 제안 기법은 사 분위(Interquatile, IRQ) 분석과 코사인 유사도 함수를 통해 데이터의 이상치와 중복성을 측정하여 중복 데이터 및 특이치를 제거한다. 본 연구를 통하여 최적의 데이터 전송을 통하여 IoT의 통신 성능을 향상시킬 수 있으며 결과적으로 데이터 감소율, 네트워크 수명 및 에너지의 효율성을 높일 수 있다.

  • PDF

Self Organized Map based Clustering for WSN Environment (WSN 환경을 위한 자체 조직 지도 기법 기반 클러스터링)

  • Kim, Min-Woo;Lee, Tae-Ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.113-114
    • /
    • 2019
  • 다수의 센서 노드로 구성된 IoT 환경에서는 네트워크 수명, 센서 노드의 통신 범위 제한과 같은 제약 사항들이 있다. 이러한 한계점을 해결하기 위해 밀집된 센서 노드 간의 협력이 필요하다. 이때, 밀집된 센서 노드들은 에너지 낭비 및 전송 데이터의 정확도를 저하시킨다. 본 연구에서는 데이터 집중으로 인해 발생하는 네트워크의 에너지 낭비 및 전송 데이터의 정확도 문제를 해결하기 위해 자체조직지도(Self Organized Map, SOM)를 기반으로 한 클러스터링 기법을 제안한다. 결과적으로 제안된 기법을 통하여 클러스터링 된 노드들은 다른 클러스터링 기법과 비교했을 때 밀도 기반의 정확한 예측 값을 얻을 수 있다.

  • PDF