• Title/Summary/Keyword: Cosine Similarity

Search Result 188, Processing Time 0.024 seconds

The proposition of cosine net confidence in association rule mining (연관 규칙 마이닝에서의 코사인 순수 신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.97-106
    • /
    • 2014
  • The development of big data technology was to more accurately predict diversified contemporary society and to more efficiently operate it, and to enable impossible technique in the past. This technology can be utilized in various fields such as the social science, economics, politics, cultural sector, and science technology at the national level. It is a prerequisite to find valuable information by data mining techniques in order to analyze big data. Data mining techniques associated with big data involve text mining, opinion mining, cluster analysis, association rule mining, and so on. The most widely used data mining technique is to explore association rules. This technique has been used to find the relationship between each set of items based on the association thresholds such as support, confidence, lift, similarity measures, etc.This paper proposed cosine net confidence as association thresholds, and checked the conditions of interestingness measure proposed by Piatetsky-Shapiro, and examined various characteristics. The comparative studies with basic confidence and cosine similarity, and cosine net confidence were shown by numerical example. The results showed that cosine net confidence are better than basic confidence and cosine similarity because of the relevant direction.

Improving the Performance of Document Clustering with Distributional Similarities (분포유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.267-283
    • /
    • 2007
  • In this study, measures of distributional similarity such as KL-divergence are applied to cluster documents instead of traditional cosine measure, which is the most prevalent vector similarity measure for document clustering. Three variations of KL-divergence are investigated; Jansen-Shannon divergence, symmetric skew divergence, and minimum skew divergence. In order to verify the contribution of distributional similarities to document clustering, two experiments are designed and carried out on three test collections. In the first experiment the clustering performances of the three divergence measures are compared to that of cosine measure. The result showed that minimum skew divergence outperformed the other divergence measures as well as cosine measure. In the second experiment second-order distributional similarities are calculated with Pearson correlation coefficient from the first-order similarity matrixes. From the result of the second experiment, secondorder distributional similarities were found to improve the overall performance of document clustering. These results suggest that minimum skew divergence must be selected as document vector similarity measure when considering both time and accuracy, and second-order similarity is a good choice for considering clustering accuracy only.

A Framework to Evaluate Communication Quality of Operators in Nuclear Power Plants Using Cosine Similarity (코사인 유사도를 이용한 원자력발전소 운전원 커뮤니케이션 품질 평가 프레임워크)

  • Kim, Seung-Hwan;Park, Jin-Kyun;Han, Sang-Yong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.9
    • /
    • pp.165-172
    • /
    • 2010
  • Communication problems have been regarded as one of the biggest causes in trouble in many industries. This led to extensive research on communication as a part of human error analysis. The results of existing researches have revealed that maintaining a good quality of communication is essential to secure the safety of a large and complex process system. In this paper, we suggested a method to measure the quality of communication during off-normal situation in main control room of nuclear power plants. It evaluates the cosine similarity that is a measure of sentence similarity between two operators by finding the cosine of the angle between them. To check the applicability of the method to evaluate communication quality, we compared the result of communication quality analysis with the result of operation performance that was performed by operators under simulated environment.

Measures of Abnormal User Activities in Online Comments Based on Cosine Similarity (코사인 유사도 기반의 인터넷 댓글 상 이상 행위 분석 방법)

  • Kim, Minjae;Lee, Sangjin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.2
    • /
    • pp.335-343
    • /
    • 2014
  • It is more important to ensure the credibility of internet media which influence the public opinion. However, there are vague suspicions in public from the examples of manipulation of online reviews with anonymity. In this study, we explore the possibility of manipulating public opinion in online web sites. We investigate the characteristics of comments posted by users on web sites and compare each comments by using the cosine similarity function. Our result shows followings. First, we found a correlation between the similarities of comments and the article ranks in the web sites. Second, it is possible to identify abnormal user activities indicating excessive multiple posting, double posting and astroturf activities.

New Approach of Evaluating Poomsae Performance with Inertial Measurement Unit Sensors (관성센서를 활용한 새로운 품새 경기력 평가 방법 연구)

  • Kim, Young-Kwan
    • Korean Journal of Applied Biomechanics
    • /
    • v.31 no.3
    • /
    • pp.199-204
    • /
    • 2021
  • Objective: The purpose of this study was to present a new idea of methodology to evaluate Poomsae performance using inertial measurement unit (IMU) sensors in terms of signal processing techniques. Method: Ten collegian Taekwondo athletes, consisting of five Poomsae elite athletes (age: 21.4 ± 0.9 years, height: 168.4 ± 11.3 cm, weight: 65.0 ± 10.6 kg, experience: 12 ± 0.7 years) and five breaking demonstration athletes (age: 21.0 ± 0.0 years, height: 168.4 ± 4.7 cm, weight: 63.8 ± 8.2 kg, experience: 13.0 ± 2.1 years), voluntarily participated in this study. They performed three different black belt Poomsae such as Goryeo, Geumgang, and Taebaek Poomsae repeatedly twice. Repeated measured motion data on the wrist and ankle were calculated by the methods of cosine similarity and Euclidean distance. Results: The Poomsse athletes showed superior performance in terms of temporal consistency at Goryeo and Taebaek Poomsae, cosine similarity at Geumgang and Taebaek Poomsae, and Euclidian distance at Geumgang Poomsae. Conclusion: IMU sensor would be a useful tool for monitoring and evaluating within-subject temporal variability of Taekwondo Poomsae motions. As well it distinguished spatiotemporal characteristics among three different Poomsae.

Multi-Modal Based Malware Similarity Estimation Method (멀티모달 기반 악성코드 유사도 계산 기법)

  • Yoo, Jeong Do;Kim, Taekyu;Kim, In-sung;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.2
    • /
    • pp.347-363
    • /
    • 2019
  • Malware has its own unique behavior characteristics, like DNA for living things. To respond APT (Advanced Persistent Threat) attacks in advance, it needs to extract behavioral characteristics from malware. To this end, it needs to do classification for each malware based on its behavioral similarity. In this paper, various similarity of Windows malware is estimated; and based on these similarity values, malware's family is predicted. The similarity measures used in this paper are as follows: 'TF-IDF cosine similarity', 'Nilsimsa similarity', 'malware function cosine similarity' and 'Jaccard similarity'. As a result, we find the prediction rate for each similarity measure is widely different. Although, there is no similarity measure which can be applied to malware classification with high accuracy, this result can be helpful to select a similarity measure to classify specific malware family.

Similarity-based Damage Detection in Offshore Jacket Structures (유사도 기반 해양 자켓 구조물 손상추정)

  • Min, Cheon-Hong;Kim, Hyung-Woo;Park, Sanghyun;Oh, Jae-Won;Nam, Bo-Woo
    • Journal of Ocean Engineering and Technology
    • /
    • v.30 no.4
    • /
    • pp.287-293
    • /
    • 2016
  • This paper presents an effective damage detection method for offshore jackets using natural frequency change ratios. Two parameters, cosine similarity and magnitude index, are considered to estimate the location and severity of the damage in the structure. A numerical jacket structure model is considered to verify the performance of the proposed method. As observed through analysis, the damages in the structure are detected accurately.

A Study on the Optimal Search Keyword Extraction and Retrieval Technique Generation Using Word Embedding (워드 임베딩(Word Embedding)을 활용한 최적의 키워드 추출 및 검색 방법 연구)

  • Jeong-In Lee;Jin-Hee Ahn;Kyung-Taek Koh;YoungSeok Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.22 no.2
    • /
    • pp.47-54
    • /
    • 2023
  • In this paper, we propose the technique of optimal search keyword extraction and retrieval for news article classification. The proposed technique was verified as an example of identifying trends related to North Korean construction. A representative Korean media platform, BigKinds, was used to select sample articles and extract keywords. The extracted keywords were vectorized using word embedding and based on this, the similarity between the extracted keywords was examined through cosine similarity. In addition, words with a similarity of 0.5 or higher were clustered based on the top 10 frequencies. Each cluster was formed as 'OR' between keywords inside the cluster and 'AND' between clusters according to the search form of the BigKinds. As a result of the in-depth analysis, it was confirmed that meaningful articles appropriate for the original purpose were extracted. This paper is significant in that it is possible to classify news articles suitable for the user's specific purpose without modifying the existing classification system and search form.

A Comparative Study of Teachers' and Students' Preference of Socio-Scientific Issues Topics (교사와 학생의 사회적-과학적 쟁점(Socio-Scientific Issues) 주제 선호도 분석)

  • Hyun Ju Park
    • Journal of Science Education
    • /
    • v.47 no.2
    • /
    • pp.180-191
    • /
    • 2023
  • The purpose of this study was to investigate the preferred SSI topics of students and teachers in elementary, middle, and high schools. It analyzed the similarity of students' and teachers' preferred SSI topics by school level using the cosine similarity measure. A total of 566 students and 327 teachers from elementary, middle, and high schools participated in the study. Sixty topics were identified and listed in the areas of environment, science and technology, health and medicine, and other social issues based on the literature and SSI programs. Students and teachers were asked to select five of their favorite topics. The data was collected online using SurveyMonkey. The collected data was divided into six groups of students and teachers, and the frequency of topic selection was analyzed within each group. The topic preference similarity was analyzed by calculating vector values based on the frequency of the selected topics and measuring the cosine similarity between students, teachers, and teachers and students by school level. The results are as follows: First, the cosine similarity of SSI Preferred Topics between students' school-level cohorts was higher between middle and high school students (0.982) than between elementary and middle school students (0.651) or between elementary and high school students (0.662). Second, the cosine similarity of SSI Preferred Topics between teachers' school-level cohorts was similar for all comparison groups between elementary, middle, and high school. Third, the SSI topic preference similarity between students and teachers by school level had a higher cosine similarity between the elementary student and teacher cohorts (0.974) than the other school level comparisons, middle school (0.621) or high school (0.645). Access to topics of interest to students in SSI education is strongly associated with motivation and persistence in learning, as well as an enjoyable learning experience and positive attitudes toward learning. Therefore, when designing SSI lessons, it is important to examine topics from the perspective of student interest, especially if the teacher has selected SSI topics that are different from students' preferences. Careful instructional design will be needed to overcome the gap.

Design and Implementation of Computer Engineering Technical Interview Support System (컴퓨터 공학 기술 면접 지원 시스템의 설계 및 구현)

  • Dong-Hyun Lee;Seung-Min Park;Dong-Hyun Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.3
    • /
    • pp.603-608
    • /
    • 2024
  • Recently, the frequency of computer engineering and technology interviews has increased in the process of hiring developers, and accordingly, the burden of technical interviews among interviewees has also increased. However, during computer engineering technical interview practice, it is difficult to judge whether one's answers are correct, and to measure the appropriate vocalization speed by oneself. In this paper, we propose a computer engineering technical interview support system using similarity measurement technology. The proposed system measures the technical accuracy of the interviewee's answers through a sentence similarity evaluation procedure using cosine similarity to measure the technical accuracy of the interviewee's answers. It also measures the speech rate and provides it to the interviewee.