• Title/Summary/Keyword: Cosine Similarity Analysis

Search Result 81, Processing Time 0.099 seconds

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

A Study on the Optimal Search Keyword Extraction and Retrieval Technique Generation Using Word Embedding (워드 임베딩(Word Embedding)을 활용한 최적의 키워드 추출 및 검색 방법 연구)

  • Jeong-In Lee;Jin-Hee Ahn;Kyung-Taek Koh;YoungSeok Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.22 no.2
    • /
    • pp.47-54
    • /
    • 2023
  • In this paper, we propose the technique of optimal search keyword extraction and retrieval for news article classification. The proposed technique was verified as an example of identifying trends related to North Korean construction. A representative Korean media platform, BigKinds, was used to select sample articles and extract keywords. The extracted keywords were vectorized using word embedding and based on this, the similarity between the extracted keywords was examined through cosine similarity. In addition, words with a similarity of 0.5 or higher were clustered based on the top 10 frequencies. Each cluster was formed as 'OR' between keywords inside the cluster and 'AND' between clusters according to the search form of the BigKinds. As a result of the in-depth analysis, it was confirmed that meaningful articles appropriate for the original purpose were extracted. This paper is significant in that it is possible to classify news articles suitable for the user's specific purpose without modifying the existing classification system and search form.

Nonlinear damage detection using linear ARMA models with classification algorithms

  • Chen, Liujie;Yu, Ling;Fu, Jiyang;Ng, Ching-Tai
    • Smart Structures and Systems
    • /
    • v.26 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Majority of the damage in engineering structures is nonlinear. Damage sensitive features (DSFs) extracted by traditional methods from linear time series models cannot effectively handle nonlinearity induced by structural damage. A new DSF is proposed based on vector space cosine similarity (VSCS), which combines K-means cluster analysis and Bayesian discrimination to detect nonlinear structural damage. A reference autoregressive moving average (ARMA) model is built based on measured acceleration data. This study first considers an existing DSF, residual standard deviation (RSD). The DSF is further advanced using the VSCS, and then the advanced VSCS is classified using K-means cluster analysis and Bayes discriminant analysis, respectively. The performance of the proposed approach is then verified using experimental data from a three-story shear building structure, and compared with the results of existing RSD. It is demonstrated that combining the linear ARMA model and the advanced VSCS, with cluster analysis and Bayes discriminant analysis, respectively, is an effective approach for detection of nonlinear damage. This approach improves the reliability and accuracy of the nonlinear damage detection using the linear model and significantly reduces the computational cost. The results indicate that the proposed approach is potential to be a promising damage detection technique.

The Redundancy Reduction Using Fuzzy C-means Clustering and Cosine Similarity on a Very Large Gas Sensor Array for Mimicking Biological Olfaction (생물학적 후각 시스템을 모방한 대규모 가스 센서 어레이에서 코사인 유사도와 퍼지 클러스터링을 이용한 중복도 제거 방법)

  • Kim, Jeong-Do;Kim, Jung-Ju;Park, Sung-Dae;Byun, Hyung-Gi;Persaud, K.C.;Lim, Seung-Ju
    • Journal of Sensor Science and Technology
    • /
    • v.21 no.1
    • /
    • pp.59-67
    • /
    • 2012
  • It was reported that the latest sensor technology allow an 65536 conductive polymer sensor array to be made with broad but overlapping selectivity to different families of chemicals emulating the characteristics found in biological olfaction. However, the supernumerary redundancy always accompanies great error and risk as well as an inordinate amount of computation time and local minima in signal processing, e.g. neural networks. In this paper, we propose a new method to reduce the number of sensor for analysis by reducing redundancy between sensors and by removing unstable sensors using the cosine similarity method and to decide on representative sensor using FCM(Fuzzy C-Means) algorithm. The representative sensors can be just used in analyzing. And, we introduce DWT(Discrete Wavelet Transform) for data compression in the time domain as preprocessing. Throughout experimental trials, we have done a comparative analysis between gas sensor data with and without reduced redundancy. The possibility and superiority of the proposed methods are confirmed through experiments.

Parametric and Non Parametric Measures for Text Similarity (텍스트 유사성을 위한 파라미터 및 비 파라미터 측정)

  • Mlyahilu, John;Kim, Jong-Nam
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.4
    • /
    • pp.193-198
    • /
    • 2019
  • The wide spread of genuine and fake information on internet has lead to various studies on text analysis. Copying and pasting others' work without acknowledgement, research results manipulation without proof has been trending for a while in the era of data science. Various tools have been developed to reduce, combat and possibly eradicate plagiarism in various research fields. Text similarity measurements can be manually done by using both parametric and non parametric methods of which this study implements cosine similarity and Pearson correlation as parametric while Spearman correlation as non parametric. Cosine similarity and Pearson correlation metrics have achieved highest coefficients of similarity while Spearman shown low similarity coefficients. We recommend the use of non parametric methods in measuring text similarity due to their non normality assumption as opposed to the parametric methods which relies on normality assumptions and biasness.

Evaluation Model for Gab Analysis Between NCS Competence Unit Element and Traditional Curriculum (NCS 능력단위 요소와 기존 교육과정 간 갭 분석을 위한 평가모델)

  • Kim, Dae-kyung;Kim, Chang-Bok
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.4
    • /
    • pp.338-344
    • /
    • 2015
  • The national competency standards (NCS) is a systematize and standardize for skills required to perform their job. The NCS has developed a learning module with materialization and standardize by competence unit element, which is the unit of specific job competency. The existing curriculum is material to gab analysis for use in education training with competence unit element. The existing gab analysis has evaluated subjectively by experts. The gab analysis by experts bring up a subject subjective decision, accuracy lack, temporal and spatial inefficiency by psychological factor. This paper is proposed automated evaluation model for problem resolve of subjective evaluation. This paper use index term extraction, term frequency-inverse document frequency for feature value extraction, cosine similarity algorithm for gab analysis between existing curriculum and competence unit element. This paper was presented similarity mapping table between existing curriculum and competence unit element. The evaluation model in this paper should be complemented by an improved algorithm from the structural characteristics and speed.

A Study on the Method of Scholarly Paper Recommendation Using Multidimensional Metadata Space (다차원 메타데이터 공간을 활용한 학술 문헌 추천기법 연구)

  • Miah Kam;Jee Yeon Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.121-148
    • /
    • 2023
  • The purpose of this study is to propose a scholarly paper recommendation system based on metadata attribute similarity with excellent performance. This study suggests a scholarly paper recommendation method that combines techniques from two sub-fields of Library and Information Science, namely metadata use in Information Organization and co-citation analysis, author bibliographic coupling, co-occurrence frequency, and cosine similarity in Bibliometrics. To conduct experiments, a total of 9,643 paper metadata related to "inequality" and "divide" were collected and refined to derive relative coordinate values between author, keyword, and title attributes using cosine similarity. The study then conducted experiments to select weight conditions and dimension numbers that resulted in a good performance. The results were presented and evaluated by users, and based on this, the study conducted discussions centered on the research questions through reference node and recommendation combination characteristic analysis, conjoint analysis, and results from comparative analysis. Overall, the study showed that the performance was excellent when author-related attributes were used alone or in combination with title-related attributes. If the technique proposed in this study is utilized and a wide range of samples are secured, it could help improve the performance of recommendation techniques not only in the field of literature recommendation in information services but also in various other fields in society.

Performance Analysis of Forwarding Schemes Based on Similarities for Opportunistic Networks (기회적 네트워크에서의 유사도 기반의 포워딩 기법의 성능 분석)

  • Kim, Sun-Kyum;Lee, Tae-Seok;Kim, Wan-Jong
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.145-150
    • /
    • 2018
  • Forwarding in opportunistic networks shows low performance because there may be no connecting paths between the source and the destination nodes due to the intermittent connectivity. Currently, social network analysis has been researched. Specifically, similarity is one of methods of social networks analysis. In this paper, we propose forwarding schemes based on representative similarities, and evaluate how much the forwarding performance increases. As a result, since the forwarding schemes are based on similarities, these schemes only forward messages to nodes with higher similarity as relay nodes, toward the destination node. These schemes have low network traffic and hop count while having stable transmission delay.

Measuring gameplay similarity between human and reinforcement learning artificial intelligence (사람과 강화학습 인공지능의 게임플레이 유사도 측정)

  • Heo, Min-Gu;Park, Chang-Hoon
    • Journal of Korea Game Society
    • /
    • v.20 no.6
    • /
    • pp.63-74
    • /
    • 2020
  • Recently, research on automating game tests using artificial intelligence agents instead of humans is attracting attention. This paper aims to collect play data from human and artificial intelligence and analyze their similarity as a preliminary study for game balancing automation. At this time, constraints were added at the learning stage in order to create artificial intelligence that can play similar to humans. Play datas obtained 14 people and 60 artificial intelligence by playing Flippy bird games 10 times each. The collected datas compared and analyzed for movement trajectory, action position, and dead position using the cosine similarity method. As a result of the analysis, an artificial intelligence agent with a similarity of 0.9 or more with humans was found.

Analysis of Performance Improvement of Collaborative Filtering based on Neighbor Selection Criteria (이웃 선정 조건에 따른 협력 필터링의 성능 향상 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.4
    • /
    • pp.55-62
    • /
    • 2015
  • Recommender systems through collaborative filtering has been utilized successfully in various areas by providing with convenience in searching information. Measuring similarity is critical in determining performance of these systems, because it is the criteria for the range of recommenders. This study analyzes distributions of similarity from traditional measures and investigates relations between similarities and the number of co-rated items. With this, this study suggests a method for selecting reliable recommenders by restricting similarities, which compensates for the drawbacks of previous measures. Experimental results showed that restricting similarities of neighbors by upper and lower thresholds yield superior performance than previous methods, especially when consulting fewer nearest neighbors. Maximum improvement of 0.047 for cosine similarity and that of 0.03 for Pearson was achieved. This result tells that a collaborative filtering system using Pearson or cosine similarities should not consult neighbors with very high or low similarities.