• Title/Summary/Keyword: Jaccard Index

Search Result 27, Processing Time 0.029 seconds

Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering (협력필터링의 데이터 희소성 해결을 위한 자카드 지수 반영의 유사도 성능 분석)

  • Lee, Soojung
    • The Journal of Korean Association of Computer Education
    • /
    • v.19 no.4
    • /
    • pp.59-66
    • /
    • 2016
  • It has been studied to reflect the number of co-rated items for solving data sparsity problem in collaborative filtering systems. A well-known method of Jaccard index allowed performance improvement, when combined with previous similarity measures. However, the degree of performance improvement when combined with existing similarity measures in various data environments are seldom analyzed, which is the objective of this study. Jaccard index as a sole similarity measure yielded much higher prediction quality than traditional measures and very high recommendation quality in a sparse dataset. In general, previous similarity measures combined with Jaccard index improved performance regardless of dataset characteristics. Especially, cosine similarity achieved the highest improvement in sparse datasets, while similarity of Mean Squared Difference degraded prediction quality in denser sets. Therefore, one needs to consider characteristics of data environment and similarity measures before combining Jaccard index for similarity use.

Jaccard Index Reflecting Time-Context for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.163-170
    • /
    • 2023
  • The user-based collaborative filtering technique, one of the implementation methods of the recommendation system, recommends the preferred items of neighboring users based on the calculations of neighboring users with similar rating histories. However, it fundamentally has a data scarcity problem in which the quality of recommendations is significantly reduced when there is little common rating history. To solve this problem, many existing studies have proposed various methods of combining Jaccard index with a similarity measure. In this study, we introduce a time-aware concept to Jaccard index and propose a method of weighting common items with different weights depending on the rating time. As a result of conducting experiments using various performance metrics and time intervals, it is confirmed that the proposed method showed the best performance compared to the original Jaccard index at most metrics, and that the optimal time interval differs depending on the type of performance metric.

Applying Different Similarity Measures based on Jaccard Index in Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.47-53
    • /
    • 2021
  • Sparse ratings data hinder reliable similarity computation between users, which degrades the performance of memory-based collaborative filtering techniques for recommender systems. Many works in the literature have been developed for solving this data sparsity problem, where the most simple and representative ones are the methods of utilizing Jaccard index. This index reflects the number of commonly rated items between two users and is mostly integrated into traditional similarity measures to compute similarity more accurately between the users. However, such integration is very straightforward with no consideration of the degree of data sparsity. This study suggests a novel idea of applying different similarity measures depending on the numeric value of Jaccard index between two users. Performance experiments are conducted to obtain optimal values of the parameters used by the proposed method and evaluate it in comparison with other relevant methods. As a result, the proposed demonstrates the best and comparable performance in prediction and recommendation accuracies.

Comparison of Plant Diversity of Natural Forest and Plantations of Rema-Kalenga Wildlife Sanctuary of Bangladesh

  • Sobuj, Norul-Alam;Rahman, Mizanur
    • Journal of Forest and Environmental Science
    • /
    • v.27 no.3
    • /
    • pp.127-134
    • /
    • 2011
  • The purpose of the study was to assess and compare the diversity of plant species (trees, shrubs, herbs) of natural forest and plantations. A total of 52 plant species were recorded in the natural forest, of which 16 were trees, 15 were shrubs and 21 were herbs. On the contrary, 31 species of plants including 11 trees, 8 shrubs and 12 herbs were identified in plantation forest. Shannon-Wiener diversity index were 2.70, 2.72 and 3.12 for trees, shrubs and herbs respectively in the natural forest. However, it was 2.35 for tree species, 2.31 for shrub species and 2.81 for herb species in the plantation forest. Jaccard's similarity index showed that 71% species of trees, 44% species of shrubs and 43% species of herbs were same in plantations and natural forest.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

Proposal of Content Recommend System on Insurance Company Web Site Using Collaborative Filtering (협업필터링을 활용한 보험사 웹 사이트 내의 콘텐츠 추천 시스템 제안)

  • Kang, Jiyoung;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.17 no.11
    • /
    • pp.201-206
    • /
    • 2019
  • While many users searched for insurance information online, there were not many cases of contents recommendation researches on insurance companies' websites. Therefore, this study proposed a page recommendation system with high possibility of preference to users by utilizing page visit history of insurance companies' websites. Data was collected by using client-side storage that occurs when using a web browser. Collaborative filtering was applied to research as a recommendation technique. As a result of experiment, we showed good performance in item-based collaborative (IBCF) based on Jaccard index using binary data which means visit or not. In the future, it will be possible to implement a content recommendation system that matches the marketing strategy when used in a company by studying recommendation technology that weights items.

A Study on the Cloud Detection Technique of Heterogeneous Sensors Using Modified DeepLabV3+ (DeepLabV3+를 이용한 이종 센서의 구름탐지 기법 연구)

  • Kim, Mi-Jeong;Ko, Yun-Ho
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_1
    • /
    • pp.511-521
    • /
    • 2022
  • Cloud detection and removal from satellite images is an essential process for topographic observation and analysis. Threshold-based cloud detection techniques show stable performance because they detect using the physical characteristics of clouds, but they have the disadvantage of requiring all channels' images and long computational time. Cloud detection techniques using deep learning, which have been studied recently, show short computational time and excellent performance even using only four or less channel (RGB, NIR) images. In this paper, we confirm the performance dependence of the deep learning network according to the heterogeneous learning dataset with different resolutions. The DeepLabV3+ network was improved so that channel features of cloud detection were extracted and learned with two published heterogeneous datasets and mixed data respectively. As a result of the experiment, clouds' Jaccard index was low in a network that learned with different kind of images from test images. However, clouds' Jaccard index was high in a network learned with mixed data that added some of the same kind of test data. Clouds are not structured in a shape, so reflecting channel features in learning is more effective in cloud detection than spatial features. It is necessary to learn channel features of each satellite sensors for cloud detection. Therefore, cloud detection of heterogeneous sensors with different resolutions is very dependent on the learning dataset.

Attention-based deep learning framework for skin lesion segmentation (피부 병변 분할을 위한 어텐션 기반 딥러닝 프레임워크)

  • Afnan Ghafoor;Bumshik Lee
    • Smart Media Journal
    • /
    • v.13 no.3
    • /
    • pp.53-61
    • /
    • 2024
  • This paper presents a novel M-shaped encoder-decoder architecture for skin lesion segmentation, achieving better performance than existing approaches. The proposed architecture utilizes the left and right legs to enable multi-scale feature extraction and is further enhanced by integrating an attention module within the skip connection. The image is partitioned into four distinct patches, facilitating enhanced processing within the encoder-decoder framework. A pivotal aspect of the proposed method is to focus more on critical image features through an attention mechanism, leading to refined segmentation. Experimental results highlight the effectiveness of the proposed approach, demonstrating superior accuracy, precision, and Jaccard Index compared to existing methods

Deep Learning based Skin Lesion Segmentation Using Transformer Block and Edge Decoder (트랜스포머 블록과 윤곽선 디코더를 활용한 딥러닝 기반의 피부 병변 분할 방법)

  • Kim, Ji Hoon;Park, Kyung Ri;Kim, Hae Moon;Moon, Young Shik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.533-540
    • /
    • 2022
  • Specialists diagnose skin cancer using a dermatoscopy to detect skin cancer as early as possible, but it is difficult to determine accurate skin lesions because skin lesions have various shapes. Recently, the skin lesion segmentation method using deep learning, which has shown high performance, has a problem in segmenting skin lesions because the boundary between healthy skin and skin lesions is not clear. To solve these issues, the proposed method constructs a transformer block to effectively segment the skin lesion, and constructs an edge decoder for each layer of the network to segment the skin lesion in detail. Experiment results have shown that the proposed method achieves a performance improvement of 0.041 ~ 0.071 for Dic Coefficient and 0.062 ~ 0.112 for Jaccard Index, compared with the previous method.

Complimentary Assessment for Conserving Vegetation on Protected Areas in South Korea (보호지역의 식물종 보전 상보성 평가)

  • Park, Jin-Han;Choe, Hyeyeong;Mo, Yongwon
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.5
    • /
    • pp.436-445
    • /
    • 2020
  • The number of protected areas has been steadily increased in Korea to achieve Aichi Target 11, and there are studies on potential protected areas that required additional designation. However, there has been an insufficient assessment of the complementarity of protected areas to conserve biodiversity effectively. This study identified the potential habitat areas using the species distribution model for plant species from the 3rd National Ecosystem Survey and compared the plant species abundance in the existing protected area and the potential protected areas using the similarity indices, such as the Jaccard index, Sorenson index, and Bray-Curtis index. As a result, we found that the complementarity of the existing protected areas and most potential protected areas were low, leading to the preservation of similar plant species. Only the buffer zone for Korea National Arboretum had high complementarity and thus is important to conserve some species with the other protected areas. This study confirmed that it was necessary to select additional protected areas outside the existing or potential protected areas to protect plant species with a low inclusion ratio of potential habitats within the protected area. This study is significant because it identified the ecological representativeness of each protected area to examine if the individual protected area can conserve unique and various species and proposed a method of finding candidate areas for additional conservation spatially. The findings of this study can be a valuable reference for the qualitative improvement of protected areas through the complementarity assessments, including animals and the effectiveness assessment study of protected areas using the National Ecosystem Survey data in the future.