• Title/Summary/Keyword: similarity weight

Search Result 376, Processing Time 0.031 seconds

Semantic Similarity Measures Between Words within a Document using WordNet (워드넷을 이용한 문서내에서 단어 사이의 의미적 유사도 측정)

  • Kang, SeokHoon;Park, JongMin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.11
    • /
    • pp.7718-7728
    • /
    • 2015
  • Semantic similarity between words can be applied in many fields including computational linguistics, artificial intelligence, and information retrieval. In this paper, we present weighted method for measuring a semantic similarity between words in a document. This method uses edge distance and depth of WordNet. The method calculates a semantic similarity between words on the basis of document information. Document information uses word term frequencies(TF) and word concept frequencies(CF). Each word weight value is calculated by TF and CF in the document. The method includes the edge distance between words, the depth of subsumer, and the word weight in the document. We compared out scheme with the other method by experiments. As the result, the proposed method outperforms other similarity measures. In the document, the word weight value is calculated by the proposed method. Other methods which based simple shortest distance or depth had difficult to represent the information or merge informations. This paper considered shortest distance, depth and information of words in the document, and also improved the performance.

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan;Yu, Zhengtao;Liu, Shulong;Zhang, Yafei;Gao, Shengxiang
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1365-1377
    • /
    • 2019
  • Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

Robust Image Similarity Measurement based on MR Physical Information

  • Eun, Sung-Jong;Jung, Eun-Young;Park, Dong Kyun;Whangbo, Taeg-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4461-4475
    • /
    • 2017
  • Recently, introduction of the hospital information system has remarkably improved the efficiency of health care services within hospitals. Due to improvement of the hospital information system, the issue of integration of medical information has emerged, and attempts to achieve it have been made. However, as a preceding step for integration of medical information, the problem of searching the same patient should be solved first, and studies on patient identification algorithm are required. As a typical case, similarity can be calculated through MPI (Master Patient Index) module, by comparing various fields such as patient's basic information and treatment information, etc. but it has many problems including the language system not suitable to Korean, estimation of an optimal weight by field, etc. This paper proposes a method searching the same patient using MRI information besides patient's field information as a supplementary method to increase the accuracy of matching algorithm such as MPI, etc. Unlike existing methods only using image information, upon identifying a patient, a highest weight was given to physical information of medical image and set as an unchangeable unique value, and as a result a high accuracy was detected. We aim to use the similarity measurement result as secondary measures in identifying a patient in the future.

A study on the Prediction Performance of the Correspondence Mean Algorithm in Collaborative Filtering Recommendation (협업 필터링 추천에서 대응평균 알고리즘의 예측 성능에 관한 연구)

  • Lee, Seok-Jun;Lee, Hee-Choon
    • Information Systems Review
    • /
    • v.9 no.1
    • /
    • pp.85-103
    • /
    • 2007
  • The purpose of this study is to evaluate the performance of collaborative filtering recommender algorithms for better prediction accuracy of the customer's preference. The accuracy of customer's preference prediction is compared through the MAE of neighborhood based collaborative filtering algorithm and correspondence mean algorithm. It is analyzed by using MovieLens 1 Million dataset in order to experiment with the prediction accuracy of the algorithms. For similarity, weight used in both algorithms, commonly, Pearson's correlation coefficient and vector similarity which are used generally were utilized, and as a result of analysis, we show that the accuracy of the customer's preference prediction of correspondence mean algorithm is superior. Pearson's correlation coefficient and vector similarity used in two algorithms are calculated using the preference rating of two customers' co-rated movies, and it shows that similarity weight is overestimated, where the number of co-rated movies is small. Therefore, it is intended to increase the accuracy of customer's preference prediction through expanding the number of the existing co-rated movies.

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN (DBSCAN을 활용한 유의어 변환 문서 유사도 측정 방법)

  • Kim, Byeongsik;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.9
    • /
    • pp.1035-1043
    • /
    • 2018
  • There is a case where the core content of another person's work is decorated as though it is his own thoughts by changing own thoughts without showing the source. Plagiarism test of copykiller free service used in plagiarism check is performed by comparing plagiarism more than 6th word. However, it is not enough to judge it as a plagiarism with a six - word match if it is replaced with a similar word. Therefore, in this paper, we construct word clusters by using DBSCAN algorithm, find synonyms, convert the words in the clusters into representative synonyms, and construct L-R tables through L-R parsing. We then propose a method for determining the similarity of documents by applying weights to the thesaurus and weights for each paragraph of the thesis.

Evaluating the Contribution of Spectral Features to Image Classification Using Class Separability

  • Ye, Chul-Soo
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.1
    • /
    • pp.55-65
    • /
    • 2020
  • Image classification needs the spectral similarity comparison between spectral features of each pixel and the representative spectral features of each class. The spectral similarity is obtained by computing the spectral feature vector distance between the pixel and the class. Each spectral feature contributes differently in the image classification depending on the class separability of the spectral feature, which is computed using a suitable vector distance measure such as the Bhattacharyya distance. We propose a method to determine the weight value of each spectral feature in the computation of feature vector distance for the similarity measurement. The weight value is determined by the ratio between each feature separability value to the total separability values of all the spectral features. We created ten spectral features consisting of seven bands of Landsat-8 OLI image and three indices, NDVI, NDWI and NDBI. For three experimental test sites, we obtained the overall accuracies between 95.0% and 97.5% and the kappa coefficients between 90.43% and 94.47%.

A Study on the Effect of Co-Ratings and Correlation Coefficient for Recommender System

  • Lee, Hee-Choon;Lee, Seok-Jun;Park, Ji-Won;Kim, Chul-Seung
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.11a
    • /
    • pp.59-69
    • /
    • 2006
  • Pearson's correlation coefficient and Vector similarity are generally applied to The users' similarity weight of user based recommender system. This study is needed to find that the correlation coefficient of similarity weight is effected by the number of pair response and significance probability. From the classified correlation coefficient by the significance probability test on the correlation coefficient and pair of response, the change of MAE is studied by comparing the predicted precision of the two. The results are experimentally related with the change of MAE from the significant correlation coefficient and the number of pair response.

  • PDF

Gaussian Noise Reduction Technique using Improved Kernel Function based on Non-Local Means Filter (비지역적 평균 필터 기반의 개선된 커널 함수를 이용한 가우시안 잡음 제거 기법)

  • Lin, Yueqi;Choi, Hyunho;Jeong, Jechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.73-76
    • /
    • 2018
  • A Gaussian noise is caused by surrounding environment or channel interference when transmitting image. The noise reduces not only image quality degradation but also high-level image processing performance. The Non-Local Means (NLM) filter finds similarity in the neighboring sets of pixels to remove noise and assigns weights according to similarity. The weighted average is calculated based on the weight. The NLM filter method shows low noise cancellation performance and high complexity in the process of finding the similarity using weight allocation and neighbor set. In order to solve these problems, we propose an algorithm that shows an excellent noise reduction performance by using Summed Square Image (SSI) to reduce the complexity and applying the weighting function based on a cosine Gaussian kernel function. Experimental results demonstrate the effectiveness of the proposed algorithm.

  • PDF

Structural Similarity Based Video Quality Metric using Human Visual System (구조적 유사도 기반의 인간의 시각적 특성을 이용한 비디오 품질 측정 기준)

  • Park, Jin-Cheol;Lee, Sang-Hoon
    • Journal of Broadcast Engineering
    • /
    • v.14 no.1
    • /
    • pp.36-43
    • /
    • 2009
  • Recently, the structural similarity (SSIM) index metric is proposed. In the present paper, a new framework, which is called visual SSIM (VSSIM), is proposed by incorporating crucial human factors into the SSIM. The human factors are foveation, luminance, frequency and motion information. The performance of VSSIM is evaluated by subjective quality test compliant with the Video Quality Expert Group (VQEG) multimedia group test plan. It shows that the visual SSIM is more correlated with the subjective quality result than the conventional SSIM.