• Title/Summary/Keyword: jaccard

Search Result 88, Processing Time 0.027 seconds

Personalized Bookmark Search Word Recommendation System based on Tag Keyword using Collaborative Filtering (협업 필터링을 활용한 태그 키워드 기반 개인화 북마크 검색 추천 시스템)

  • Byun, Yeongho;Hong, Kwangjin;Jung, Keechul
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.11
    • /
    • pp.1878-1890
    • /
    • 2016
  • Web 2.0 has features produced the content through the user of the participation and share. The content production activities have became active since social network service appear. The social bookmark, one of social network service, is service that lets users to store useful content and share bookmarked contents between personal users. Unlike Internet search engines such as Google and Naver, the content stored on social bookmark is searched based on tag keyword information and unnecessary information can be excluded. Social bookmark can make users access to selected content. However, quick access to content that users want is difficult job because of the user of the participation and share. Our paper suggests a method recommending search word to be able to access quickly to content. A method is suggested by using Collaborative Filtering and Jaccard similarity coefficient. The performance of suggested system is verified with experiments that compare by 'Delicious' and "Feeltering' with our system.

Application of RAPD markers for characterization of ${\gamma}$-ray-induced rose mutants and assessment of genetic diversity

  • Chakrabarty, D.;Datta, S.K.
    • Plant Biotechnology Reports
    • /
    • v.4 no.3
    • /
    • pp.237-242
    • /
    • 2010
  • Six parent and their 12 gamma ray-induced somatic flower colour mutants of garden rose were characterized to discriminate the mutants from their respective parents and understanding the genetic diversity using Random amplification of polymorphic DNA (RAPD) markers. Out of 20 primers screened, 14 primers yielded completely identical fragments patterns. The other 7 primers gave highly polymorphic banding patterns among the radiomutants. All the cultivars were identified by using only 7 primers. Moreover, individual mutants were also distinguished by unique RAPD marker bands. Based on the presence or absence of the 48 polymorphic bands, the genetic variations within and among the 18 cultivars were measured. Genetic distance between all 18 cultivars varied from 0.40 to 0.91, as revealed by Jaccard's coefficient matrix. A dendrogram was constructed based on the similarity matrix using the Neighbor Joining Tree method showed three main clusters. The present RAPD analysis can be used not only for estimating genetic diversity present in gamma ray-induced mutants but also for correct identification of mutant/new varieties for their legal protection under plant variety rights.

Performance Analysis of Forwarding Schemes Based on Similarities for Opportunistic Networks (기회적 네트워크에서의 유사도 기반의 포워딩 기법의 성능 분석)

  • Kim, Sun-Kyum;Lee, Tae-Seok;Kim, Wan-Jong
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.145-150
    • /
    • 2018
  • Forwarding in opportunistic networks shows low performance because there may be no connecting paths between the source and the destination nodes due to the intermittent connectivity. Currently, social network analysis has been researched. Specifically, similarity is one of methods of social networks analysis. In this paper, we propose forwarding schemes based on representative similarities, and evaluate how much the forwarding performance increases. As a result, since the forwarding schemes are based on similarities, these schemes only forward messages to nodes with higher similarity as relay nodes, toward the destination node. These schemes have low network traffic and hop count while having stable transmission delay.

An Experimental Study on Selecting Association Terms Using Text Mining Techniques (텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구)

  • Kim, Su-Yeon;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.147-165
    • /
    • 2006
  • In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

Load Shedding for Temporal Queries over Data Streams

  • Al-Kateb, Mohammed;Lee, Byung-Suk
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.4
    • /
    • pp.294-304
    • /
    • 2011
  • Enhancing continuous queries over data streams with temporal functions and predicates enriches the expressive power of those queries. While traditional continuous queries retrieve only the values of attributes, temporal continuous queries retrieve the valid time intervals of those values as well. Correctly evaluating such queries requires the coalescing of adjacent timestamps for value-equivalent tuples prior to evaluating temporal functions and predicates. For many stream applications, the available computing resources may be too limited to produce exact query results. These limitations are commonly addressed through load shedding and produce approximated query results. There have been many load shedding mechanisms proposed so far, but for temporal continuous queries, the presence of coalescing makes theses existing methods unsuitable. In this paper, we propose a new accuracy metric and load shedding algorithm that are suitable for temporal query processing when memory is insufficient. The accuracy metric uses a combination of the Jaccard coefficient to measure the accuracy of attribute values and $\mathcal{PQI}$ interval orders to measure the accuracy of the valid time intervals in the approximate query result. The algorithm employs a greedy strategy combining two objectives reflecting the two accuracy metrics (i.e., value and interval). In the performance study, the proposed greedy algorithm outperforms a conventional random load shedding algorithm by up to an order of magnitude in its achieved accuracy.

Analysis of Change Detection Results by UNet++ Models According to the Characteristics of Loss Function (손실함수의 특성에 따른 UNet++ 모델에 의한 변화탐지 결과 분석)

  • Jeong, Mila;Choi, Hoseong;Choi, Jaewan
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_2
    • /
    • pp.929-937
    • /
    • 2020
  • In this manuscript, the UNet++ model, which is one of the representative deep learning techniques for semantic segmentation, was used to detect changes in temporal satellite images. To analyze the learning results according to various loss functions, we evaluated the change detection results using trained UNet++ models by binary cross entropy and the Jaccard coefficient. In addition, the learning results of the deep learning model were analyzed compared to existing pixel-based change detection algorithms by using WorldView-3 images. In the experiment, it was confirmed that the performance of the deep learning model could be determined depending on the characteristics of the loss function, but it showed better results compared to the existing techniques.

Development of a Clustering Model for Automatic Knowledge Classification (지식 분류의 자동화를 위한 클러스터링 모형 연구)

  • 정영미;이재윤
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.203-230
    • /
    • 2001
  • The purpose of this study is to develop a document clustering model for automatic classification of knowledge. Two test collections of newspaper article texts and journal article abstracts are built for the clustering experiment. Various feature reduction criteria as well as term weighting methods are applied to the term sets of the test collections, and cosine and Jaccard coefficients are used as similarity measures. The performances of complete linkage and K-means clustering algorithms are compared using different feature selection methods and various term weights. It was found that complete linkage clustering outperforms K-means algorithm and feature reduction up to almost 10% of the total feature sets does not lower the performance of document clustering to any significant extent.

  • PDF

Measure of the Associations of Accupoints and Pathologies Documented in the Classical Acupuncture Literature (고의서에 나타난 경혈과 병증의 연관성 측정 및 시각화 - 침구자생경 분석 예를 중심으로 -)

  • Oh, Junho
    • Korean Journal of Acupuncture
    • /
    • v.33 no.1
    • /
    • pp.18-32
    • /
    • 2016
  • Objectives : This study aims to analyze the co-occurrence of pathological symptoms and corresponding acupoints as documented by the comprehensive acupuncture and moxibustion records in the classical texts of Far East traditional medicine as an aid to a more efficient understanding of the tacit treatment principles of ancient physicians. Methods : The Classic of Nourishing Life with Acupuncture and Moxibustion(Zhenjiu Zisheng Jing; hereinafter ZZJ) was selected as the primary reference book for the analysis. The pathology-acupoint co-occurrence analysis was performed by applying 4 values of vector space measures(weighted Euclidean distance, Euclidean distance, $Cram\acute{e}r^{\prime}s$ V and Canberra distance), which measure the distance between the observed and expected co-occurrence counts, and 3 values of probabilistic measures(association strength, Fisher's exact test and Jaccard similarity), which measure the probability of observed co-occurrences. Results : The treatment records contained in ZZJ were preprocessed, which yielded 4162 pathology-acupoint sets. Co-occurrence was performed applying 7 different analysis variables, followed by a prediction simulation. The prediction simulation results revealed the Weighted Euclidean distance had the highest prediction rate with 24.32%, followed by Canberra distance(23.14%) and association strength(21.29%). Conclusions : The weighted Euclidean distance among the vector space measures and the association strength among the probabilistic measures were verified to be the most efficient analysis methods in analyzing the correlation between acupoints and pathologies found in the classical medical texts.

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

RAPD Polymorphism and Genetic Distance among Phenotypic Variants of Tamarindus indica

  • Mayavel, A;Vikashini, B;Bhuvanam, S;Shanthi, A;Kamalakannan, R;Kim, Ki-Won;Kang, Kyu-Suk
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.4
    • /
    • pp.421-428
    • /
    • 2020
  • Tamarind (Tamarindus indica L.) is one of the multipurpose tree species distributed in the tropical and sub-tropical climates. It is an important fruit yielding tree that supports the livelihood and has high social and cultural values for rural communities. The vegetative, reproductive, qualitative, and quantitative traits of tamarind vary widely. Characterization of phenotypic and genetic structure is essential for the selection of suitable accessions for sustainable cultivation and conservation. This study aimedto examine the genetic relationship among the collected accessions of sweet, red, and sour tamarind by using Random Amplified Polymorphic DNA (RAPD) primers. Nine accessions were collected from germplasm gene banks and subjected to marker analysis. Fifteen highly polymorphic primers generated a total of 169 fragments, out of which 138 bands were polymorphic. The polymorphic information content of RAPD markers varied from 0.10 to 0.44, and the Jaccard's similarity coefficient values ranged from 0.37 to 0.70. The genetic clustering showed a sizable genetic variation in the tamarind accessions at the molecular level. The molecular and biochemical variations in the selected accessions are very important for developing varieties with high sugar, anthocyanin, and acidity traits in the ongoing tamarind improvement program.