• Title/Summary/Keyword: Jaccard Coefficient

Search Result 38, Processing Time 0.021 seconds

Improving Performance of Jaccard Coefficient for Collaborative Filtering

  • Lee, Soojung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.11
    • /
    • pp.121-126
    • /
    • 2016
  • In recommender systems based on collaborative filtering, measuring similarity is very critical for determining the range of recommenders. Data sparsity problem is fundamental in collaborative filtering systems, which is partly solved by Jaccard coefficient combined with traditional similarity measures. This study proposes a new coefficient for improving performance of Jaccard coefficient by compensating for its drawbacks. We conducted experiments using datasets of various characteristics for performance analysis. As a result of comparison between the proposed and the similarity metric of Pearson correlation widely used up to date, it is found that the two metrics yielded competitive performance on a dense dataset while the proposed showed much better performance on a sparser dataset. Also, the result of comparing the proposed with Jaccard coefficient showed that the proposed yielded far better performance as the dataset is denser. Overall, the proposed coefficient demonstrated the best prediction and recommendation performance among the experimented metrics.

Stagewise Weak Orthogonal Matching Pursuit Algorithm Based on Adaptive Weak Threshold and Arithmetic Mean

  • Zhao, Liquan;Ma, Ke
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1343-1358
    • /
    • 2020
  • In the stagewise arithmetic orthogonal matching pursuit algorithm, the weak threshold used in sparsity estimation is determined via maximum iterations. Different maximum iterations correspond to different thresholds and affect the performance of the algorithm. To solve this problem, we propose an improved variable weak threshold based on the stagewise arithmetic orthogonal matching pursuit algorithm. Our proposed algorithm uses the residual error value to control the weak threshold. When the residual value decreases, the threshold value continuously increases, so that the atoms contained in the atomic set are closer to the real sparsity value, making it possible to improve the reconstruction accuracy. In addition, we improved the generalized Jaccard coefficient in order to replace the inner product method that is used in the stagewise arithmetic orthogonal matching pursuit algorithm. Our proposed algorithm uses the covariance to replace the joint expectation for two variables based on the generalized Jaccard coefficient. The improved generalized Jaccard coefficient can be used to generate a more accurate calculation of the correlation between the measurement matrixes. In addition, the residual is more accurate, which can reduce the possibility of selecting the wrong atoms. We demonstrate using simulations that the proposed algorithm produces a better reconstruction result in the reconstruction of a one-dimensional signal and two-dimensional image signal.

Hierarchic Document Clustering in OPAC (OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구)

  • 노정순
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.1
    • /
    • pp.93-117
    • /
    • 2004
  • This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

Road Tracking based on Prior Information in Video Sequences (비디오 영상에서 사전정보 기반의 도로 추적)

  • Lee, Chang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.2
    • /
    • pp.19-25
    • /
    • 2013
  • In this paper, we propose an approach to tracking road regions from video sequences. The proposed method segments and tracks road regions by utilizing the prior information from the result of the previous frame. For the efficiency of the system, we have a simple assumption that the road region is usually shown in the lower part of input images so that lower 60% of input images is set to the region of interest(ROI). After initial segmentation using flood-fill algorithm, we merge neighboring regions based on color similarity measure. The previous segmentation result, in which seed points for the successive frame are extracted, is used as prior information to segment the current frame. The similarity between the road region of the previous frame and that of the current frame is measured by the modified Jaccard coefficient. According to the similarity we refine and track the detected road regions. The experimental results reveal that the proposed method is effective to segment and track road regions in noisy and non-noisy environments.

Predicting link of R&D network to stimulate collaboration among education, industry, and research (산학연 협업 활성화를 위한 R&D 네트워크 연결 예측 연구)

  • Park, Mi-yeon;Lee, Sangheon;Jin, Guocheng;Shen, Hongme;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.37-52
    • /
    • 2015
  • The recent global trends display expansion and growing solidity in both cooperative collaboration between industry, education, and research and R&D network systems. A greater support for the network and cooperative research sector would open greater possibilities for the evolution of new scholar and industrial fields and the development of new theories evoked from synergized educational research. Similarly, the national need for a strategy that can most efficiently and effectively support R&D network that are established through the government's R&D project research is on the rise. Despite the growing urgency, due to the habitual dependency on simple individual personal information data regarding R&D industry participants and generalized statistical data references, the policies concerning network system are disappointing and inadequate. Accordingly, analyses of the relationships involved for each subject who is participating in the R&D industry was conducted and on the foundation of an educational-industrial-research network system, possible changes within and of the network that may arise were predicted. To predict the R&D network transitions, Common Neighbor and Jaccard's Coefficient models were designated as the basic foundational models, upon which a new prediction model was proposed to address the limitations of the two aforementioned former models and to increase the accuracy of Link Prediction, with which a comparative analysis was made between the two models. Through the effective predictions regarding R&D network changes and transitions, such study result serves as a stepping-stone for an establishment of a prospective strategy that supports a desirable educational-industrial-research network and proposes a measure to promote the national policy to one that can effectively and efficiently sponsor integrated R&D industries. Though both weighted applications of Common Neighbor and Jaccard's Coefficient models provided positive outcomes, improved accuracy was comparatively more prevalent in the weighted Common Neighbor. An un-weighted Common Neighbor model predicted 650 out of 4,136 whereas a weighted Common Neighbor model predicted 50 more results at a total of 700 predictions. While the Jaccard's model demonstrated slight performance improvements in numeric terms, the differences were found to be insignificant.

Extraction of Building Boundary on Aerial Image Using Segmentation and Overlaying Algorithm (분할과 중첩 기법을 이용한 항공 사진 상의 빌딩 경계 추출)

  • Kim, Yong-Min;Chang, An-Jin;Kim, Yong-Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.30 no.1
    • /
    • pp.49-58
    • /
    • 2012
  • Buildings become complex and diverse with time. It is difficult to extract individual buildings using only an optical image, because they have similar spectral characteristics to objects such as vegetation and roads. In this study, we propose a method to extract building area and boundary through integrating airborne Light Detection and Ranging(LiDAR) data and aerial images. Firstly, a binary edge map was generated using Edison edge detector after applying Adaptive dynamic range linear stretching radiometric enhancement algorithm to the aerial image. Secondly, building objects on airborne LiDAR data were extracted from normalized Digital Surface Model and aerial image. Then, a temporary building areas were extracted by overlaying the binary edge map and building objects extracted from LiDAR data. Finally, some building boundaries were additionally refined considering positional accuracy between LiDAR data and aerial image. The proposed method was applied to two experimental sites for validation. Through error matrix, F-measure, Jaccard coefficient, Yule coefficient, and Overall accuracy were calculated, and the values had a higher accuracy than 0.85.

Longitudinal Variation of Fish Communities in the Geum River, Korea: Application of the Concept of Beta Diversity and Local Uniqueness

  • Kim, Jeong-Hui;Park, Sang-Hyeon;Baek, Seung-Ho;Hong, Donghyun;Jo, Hyunbin
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.3 no.2
    • /
    • pp.122-128
    • /
    • 2022
  • To present the spatial variation of fish assemblages in the Geum River in Korea, the concept of beta diversity (β-diversity) estimates based on the variance of the community data table was applied. Fish communities and environmental variables were collected from 13 sampling sites along the in mid-low reaches of the River. We calculated the β-diversity and local contribution to beta diversity (LCBD) values at each site depending on the two types of data, 'occurrence' with Jaccard and Sørensen dissimilarity coefficients, and 'abundance' with Hellinger distance. Multivariate and correlation analyses were also performed to determine the relationships between LCBD and other variables, such as community indices and physicochemical and hydrological factors. The β-diversity values of fish communities in the River were estimated as 0.218 and 0.145 for occurrence data table with Jaccard and Sørensen respectively, and 0.268 for abundance data. Similar patterns of LCBD along the sampling sites were detected in two dissimilarity measurements of occurrence table, and LCBD values with abundance data were slightly different. The LCBD values are strongly correlated with community indices, and also suitable for indicating the uniqueness of fish assemblages. However, further research is needed to determine the LCBD value as an indicator of environmental variability.

Deep Learning based Skin Lesion Segmentation Using Transformer Block and Edge Decoder (트랜스포머 블록과 윤곽선 디코더를 활용한 딥러닝 기반의 피부 병변 분할 방법)

  • Kim, Ji Hoon;Park, Kyung Ri;Kim, Hae Moon;Moon, Young Shik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.533-540
    • /
    • 2022
  • Specialists diagnose skin cancer using a dermatoscopy to detect skin cancer as early as possible, but it is difficult to determine accurate skin lesions because skin lesions have various shapes. Recently, the skin lesion segmentation method using deep learning, which has shown high performance, has a problem in segmenting skin lesions because the boundary between healthy skin and skin lesions is not clear. To solve these issues, the proposed method constructs a transformer block to effectively segment the skin lesion, and constructs an edge decoder for each layer of the network to segment the skin lesion in detail. Experiment results have shown that the proposed method achieves a performance improvement of 0.041 ~ 0.071 for Dic Coefficient and 0.062 ~ 0.112 for Jaccard Index, compared with the previous method.

Red Tide Detection Based on Two Stage Filtering with MODIS Chlorophyll Information (MODIS 클로로필 정보를 이용한 2단계 필터링 기반 적조 탐지)

  • Kim, Yong-Min;Byun, Young-Gi;Kim, Yong-Il;Yu, Ki-Yun
    • Proceedings of the KSRS Conference
    • /
    • 2008.03a
    • /
    • pp.170-175
    • /
    • 2008
  • 본 연구는 MODIS에서 제공하는 클로로필 정보를 기반으로 하여 2단계 필터링을 통해 우리나라 동해, 남해 연안에 대규모로 발생했던 Cochlodinium polykrikoides 적조를 탐지하는 알고리즘을 제시한다. 일반적인 적조 탐지 연구들은 클로로필과 적조 발생의 상관성을 이용하여 클로로필의 농도가 높은 해역을 적조 발생 해역으로 탐지한다. 하지만 이 방법의 문제점은 적조가 발생하지 않은 해역을 적조 발생 해역으로 탐지함으로써 commission error를 발생시킨다는 것이다. 따라서 본 연구에서는 이러한 문제점을 극복하기 위해 MODIS에서 제공하는 클로로필 정보를 바탕으로 적조 발생 해역을 추출하고, 2단계 필터링 과정을 적용함으로써 진해, 여수, 남해도 부근 해역에서 발생한 commission error를 제거할 수 있었으며, 그 결과를 국립수산과학원의 적조속보자료와 함께 시각적 평가하여 본 연구에서 제안한 알고리즘의 효용성을 검증하였다. 향후 정량적인 평가를 위해 F-measure, JC(Jaccard coefficient), YC(Yule coefficient), 전체정확도를 탐지정확도 측정치로써 도입하여 정확도평가를 수행할 예정이다.

  • PDF

Impact of the Pollution on the Benthic Community Environmental impact of the pollution on the benthic coralligenous community in the Gulf of Fos, northwestern Mediterranean (북서 지중해 Fos해역의 해양오염이 해양저서생물군집 Coralligenous Community에 미치는 영향)

  • HONG Jae-Sang
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.16 no.3
    • /
    • pp.273-290
    • /
    • 1983
  • A bionomic study of the coralligenous concretionary hard bottom in the northwestern Mediterranean was carried out at four stations : three stations(Arnette, Laurons, Auguette) under the influence of intense multisource pollution(gradient decreasing from north to south) in the Gulf of Fos, west of Marseille, France, and one control station(Moyade islet) in an unpolluted area near Riou island, east of Marseille. Along the increasing pollution gradient from the outer to the inner part of the Gulf of Fos, there is a qualitative and quantitative impoverishment of the fauna. On the whole, the species richness, the numerical abundance. and the species diversity index are all on the decrease. Accordingly, the innermost station in the Gulf of Fos(Auguette) is most heavily affected by the industrial, and to a lesser extent by the domestic wastes, from the nearby industrial complex and urban areas. The impact on the benthic coralligenous community of this serious alteration has been analysed in view of community composition, functional aspect, and ecological stocks. The faunal affinity between stations has been studied by means of the two coefficients : fourfold point correlation coefficient and Jaccard's community coefficient. The upper layer and inferior face communities of the coralligenous concretionary structures are also compared.

  • PDF