• Title/Summary/Keyword: Cosine similarity

Search Result 189, Processing Time 0.02 seconds

Development of An Automatic Classification System for Game Reviews Based on Word Embedding and Vector Similarity (단어 임베딩 및 벡터 유사도 기반 게임 리뷰 자동 분류 시스템 개발)

  • Yang, Yu-Jeong;Lee, Bo-Hyun;Kim, Jin-Sil;Lee, Ki Yong
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.2
    • /
    • pp.1-14
    • /
    • 2019
  • Because of the characteristics of game software, it is important to quickly identify and reflect users' needs into game software after its launch. However, most sites such as the Google Play Store, where users can download games and post reviews, provide only very limited and ambiguous classification categories for game reviews. Therefore, in this paper, we develop an automatic classification system for game reviews that categorizes reviews into categories that are clearer and more useful for game providers. The developed system converts words in reviews into vectors using word2vec, which is a representative word embedding model, and classifies reviews into the most relevant categories by measuring the similarity between those vectors and each category. Especially, in order to choose the best similarity measure that directly affects the classification performance of the system, we have compared the performance of three representative similarity measures, the Euclidean similarity, cosine similarity, and the extended Jaccard similarity, in a real environment. Furthermore, to allow a review to be classified into multiple categories, we use a threshold-based multi-category classification method. Through experiments on real reviews collected from Google Play Store, we have confirmed that the system achieved up to 95% accuracy.

Mining Semantically Similar Tags from Delicious (딜리셔스에서 유사태그 추출에 관한 연구)

  • Yi, Kwan
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.2
    • /
    • pp.127-147
    • /
    • 2009
  • The synonym issue is an inherent barrier in human-computer communication, and it is more challenging in a Web 2.0 application, especially in social tagging applications. In an effort to resolve the issue, the goal of this study is to test the feasibility of a Web 2.0 application as a potential source for synonyms. This study investigates a way of identifying similar tags from a popular collaborative tagging application, Delicious. Specifically, we propose an algorithm (FolkSim) for measuring the similarity of social tags from Delicious. We compared FolkSim to a cosine-based similarity method and observed that the top-ranked tags on the similar list generated by FolkSim tend to be among the best possible similar tags in given choices. Also, the lists appear to be relatively better than the ones created by CosSim. We also observed that tag folksonomy and similar list resemble each other to a certain degree so that it possibly serves as an alternative outcome, especially in case the FolkSim-based list is unavailable or infeasible.

Recommendation System using Associative Web Document Classification by Word Frequency and α-Cut (단어 빈도와 α-cut에 의한 연관 웹문서 분류를 이용한 추천 시스템)

  • Jung, Kyung-Yong;Ha, Won-Shik
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.282-289
    • /
    • 2008
  • Although there were some technological developments in improving the collaborative filtering, they have yet to fully reflect the actual relation of the items. In this paper, we propose the recommendation system using associative web document classification by word frequency and ${\alpha}$-cut to address the short comings of the collaborative filtering. The proposed method extracts words from web documents through the morpheme analysis and accumulates the weight of term frequency. It makes associative rules and applies the weight of term frequency to its confidence by using Apriori algorithm. And it calculates the similarity among the words using the hypergraph partition. Lastly, it classifies related web document by using ${\alpha}$-cut and calculates similarity by using adjusted cosine similarity. The results show that the proposed method significantly outperforms the existing methods.

Digital Watermarking Technique using self-similarity (자기유사성을 이용한 디지털 워터마킹 기법)

  • Lee, Mun-Hee;Lee, Young-hee
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.4
    • /
    • pp.37-47
    • /
    • 2003
  • In this paper. we propose a new digital watermarking technique which uses the self-similarity of OCT(Discrete Cosine Transform) coefficients for the ownership protection of an image, similar coefficients are classified by SOM(Self-Organizing Map) out of Neural Network. The watermark is inserted into the selected cluster among clusters which consist of coefficients. Generally, the inserted watermark in high frequency regions of an image is eliminated by the compression process such as JPEG compressions, and the inserted watermark in low frequency regions of an image causes the distortion of an image quality. Therefore, the watermark is inserted into the cluster that has many coefficients in the middle frequency regions. This algorithm reduces the distortion of an image quality because of inserting the watermark into an image according to the number of coefficients in selected cluster. To extract watermarks from the watermarked image, the selected cluster is used without an original image. In the experiment, the new proposed algorithm have a good quality and endure attacks(JPEG compressions, filtering. zoom in, zoom out, cropping, noises) very well.

  • PDF

The Classification of Arrhythmia Using Similarity Analysis Between Unit Patterns at ECG Signal (ECG 신호에서 단위패턴간 유사도분석을 이용한 부정맥 분류 알고리즘)

  • Bae, Jung-Hyoun;Lim, Seung-Ju;Kim, Jeong-Ju;Park, Sung-Dae;Kim, Jeong-Do
    • The KIPS Transactions:PartD
    • /
    • v.19D no.1
    • /
    • pp.105-112
    • /
    • 2012
  • Most methods for detecting PVC and APC require the measurement of accurate QRS complex, P wave and T wave. In this study, we propose new algorithm for detecting PVC and APC without using complex parameter and algorithms. Proposed algorithm have wide applicability to abnormal waveform by personal distinction and difference as well as all sorts of normal waveform on ECG. To achieve this, we separate ECG signal into each unit patterns and made a standard unit pattern by just using unit patterns which have normal R-R internal. After that, we detect PVC and APC by using similarity analysis for pattern matching between standard unit pattern and each unit patterns.

Content Recommendation Techniques for Personalized Software Education (개인화된 소프트웨어 교육을 위한 콘텐츠 추천 기법)

  • Kim, Wan-Seop
    • Journal of Digital Convergence
    • /
    • v.17 no.8
    • /
    • pp.95-104
    • /
    • 2019
  • Recently, software education has been emphasized as a key element of the fourth industrial revolution. Many universities are strengthening the software education for all students according to the needs of the times. The use of online content is an effective way to introduce SW education for all students. However, the provision of uniform online contents has limitations in that it does not consider individual characteristics(major, sw interest, comprehension, interests, etc.) of students. In this study, we propose a recommendation method that utilizes the directional similarity between contents in the boolean view history data environment. We propose a new item-based recommendation formula that uses the confidence value of association rule analysis as the similarity level and apply it to the data of domestic paid contents site. Experimental results show that the recommendation accuracy is improved than when using the traditional collaborative recommendation using cosine or jaccard for similarity measurements.

Semi-automatic Data Fusion Method for Spatial Datasets (공간 정보를 가지는 데이터셋의 준자동 융합 기법)

  • Yoon, Jong-chan;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.

Advanced CBS (Cost Breakdown Structure) Code Search Technology Applying NLP (Natural Language Processing) of Artificial Intelligence (인공지능 자연어 처리 기법을 이용한 개선된 내역코드 탐색방법)

  • Kim, HanDo;Nam, JeongYong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.719-731
    • /
    • 2024
  • For efficient construction management, linking BIM with schedule and cost is essential, but there are limits to the application of 5D BIM due to the difficulty in disassembling thousands of WBS and CBS. To solve this problem, a standardized WBS-CBS set is configured in advance, and when a new construction project occurs, the CBS in the BOQ is automatically linked to the WBS when a text most similar to it is found among the standard CBS (Public Procurement Service standard construction code) of the already linked set. A method was used to compare the text similarity of CBS more efficiently using artificial intelligence natural language processing techniques. Firstly, we created a civil term dictionary (CTD) that organized the words used in civil projects and assigned numerical values, tokenized the text of all CBS into words defined in the dictionary, converted them into TF-IDF vectors, and determined them by cosine similarity. Additionally, the search success rate increased to nearly 70 % by considering CBS' hierarchical structure and changing keywords. The threshold value for judging similarity was 0.62 (1: perfect match, 0: no match).

Modified Generic Mode Coding Scheme for Enhanced Sound Quality of G.718 SWB (G.718 초광대역 코덱의 음질 향상을 위한 개선된 Generic Mode Coding 방법)

  • Cho, Keun-Seok;Jeong, Sang-Bae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.119-125
    • /
    • 2012
  • This paper describes a new algorithm for encoding spectral shape and envelope in the generic mode of G.718 super-wide band (SWB). In the G.718 SWB coder, generic mode coding and sinusoidal enhancement are used for the quantization of modified discrete cosine transform (MDCT)-based parameters in the high frequency band. In the generic mode, the high frequency band is divided into sub-bands and for every sub-band the most similar match with the selected similarity criteria is searched from the coded and envelope normalized wideband content. In order to improve the quantization scheme in high frequency region of speech/audio signals, the modified generic mode by the improvement of the generic mode in G.718 SWB is proposed. In the proposed generic mode, perceptual vector quantization of spectral envelopes and the resolution increase for spectral copy are used. The performance of the proposed algorithm is evaluated in terms of objective quality. Experimental results show that the proposed algorithm increases the quality of sounds significantly.

Analysis of the effectiveness of the Recommendation Model for the Customized Learning Course (맞춤형 학습코스 추천 모델의 효과분석 방안)

  • Han, Ji-won;Lim, Heui-seok
    • Proceedings of The KACE
    • /
    • 2017.08a
    • /
    • pp.221-224
    • /
    • 2017
  • 본 논문은 사용자 수준에 적합한 맞춤형 학습코스를 추천하여 학습효과를 향상시킬 수 있는 추천모델을 개발하고, 효과분석을 위한 방안을 제시한다. 학습자 개개인의 학습수준이나 학습내용 등에 따라 적합한 학습주제를 선정하여 제공하는 것은 중요하나, 일반적인 추천은 전문가 그룹을 활용한 사람중심의 추천으로 시간이 오래 걸리는 등 자원의 비효율적 한계점[1]을 가지고 있다. 이를 극복하기 위해, TF-IDF를 이용해 단어별 가중치를 계산하여 고빈도 단어를 추출하여 벡터 공간에 배치시키고, Cosine Similarity 기법을 이용해 벡터간의 유사도를 측정하였다. 학습자 프로파일을 분석하고, 학습스킬간의 연관성을 고려하여 맞춤형 학습코스를 추천하기 위해, 워드 임베딩 기법을 적용하였고, 이를 위해 오픈소스 Gensim[2]을 이용하였다. 맞춤형 학습코스 추천 모델의 효과를 분석하기 위한 실험을 설계하고 평가 문항지를 개발하였다.

  • PDF