• Title/Summary/Keyword: Jaccard

Search Result 90, Processing Time 0.024 seconds

Personalized Bookmark Recommendation System Using Tag Network (태그 네트워크를 이용한 개인화 북마크 추천시스템)

  • Eom, Tae-Young;Kim, Woo-Ju;Park, Sang-Un
    • The Journal of Society for e-Business Studies
    • /
    • v.15 no.4
    • /
    • pp.181-195
    • /
    • 2010
  • The participation and share between personal users are the driving force of Web 2.0, and easily found in blog, social network, collective intelligence, social bookmarking and tagging. Among those applications, the social bookmarking lets Internet users to store bookmarks online and share them, and provides various services based on shared bookmarks which people think important.Delicious.com is the representative site of social bookmarking services, and provides a bookmark search service by using tags which users attach to the bookmarks. Our paper suggests a method re-ranking the ranks from Delicious.com based on user tags in order to provide personalized bookmark recommendations. Moreover, a method to consider bookmarks which have tags not directly related to the user query keywords is suggested by using tag network based on Jaccard similarity coefficient. The performance of suggested system is verified with experiments that compare the ranks by Delicious.com with new ranks of our system.

Isolation, Characterization and Numerical Taxonomy of Novel Oxalate-oxidizing Bacteria

  • Sahin, Nurettin;Gokler, Isa;Tamer, Abdurrahman
    • Journal of Microbiology
    • /
    • v.40 no.2
    • /
    • pp.109-118
    • /
    • 2002
  • The present work is aimed at providing additional new pure cultures of oxalate utilizing bacteria and its preliminary characterization for further work in the field of oxalate-metabolism and taxonomic studies. The taxonomy of 14 mesophilic, aerobic oxalotrophic bacteria isolated by an enrichment culture technique from soils rhizosphers, and the juice of the petiole/stem tissue of plants was investigated. Isolates were characterized with 95 morphological, biochemical and physiological tests. Cellular lipid components and carotenoids of isolates were also studied as an aid to taxonomic characterization. All isolates were Gram-negative, oxidase and catalase positive and no growth factors were required. In addition to oxalates, some of the strains grow on methanol and/or formate. The taxonomic similarities among isolates, reference strains or previously reported oxalotrophic bacteria were analysed by using the Simple Matching (S/ sub SM/) and Jaccard (S$\_$J/) Coefficients. Clustering was performed by using the unweighted pair group method with arithmetic averages (UPGMA) algorithm. The oxalotrophic strains formed five major and two single-member clusters at the 70-86% similarity level. Based on the numerical taxonomy, isolates were separated into three phenotypic groups. Pink-pigmented strains belonged to Methylobacterium extorquens, yellow-pigmented strains were most similar to Pseudomonas sp. YOx and Xanthobacter autorophicus, and heterogeneous non-pigmented strains were closely related to genera Azospirillum, Ancylobacter, Burkholderia and Pseudomonas. New strains belonged to the genera Pseudomonas, Azospirillum and Ancylobacter that differ taxonomically from other known oxalate oxidizers were obtained. Numerical analysis indicated that some strains of the yellow-pigmented and nonpigmented clusters might represent new species.

Genetic Diversity of Korean Barley (Hordeum vulgare L.) Varieties Using Microsatellite Markers (Microsatellite 마커를 이용한 한국 보리 품종의 유전적 다양성)

  • Kwon, Yong-Sham;Hong, Jee-Hwa;Choi, Keun-Jin
    • Korean Journal of Breeding Science
    • /
    • v.43 no.4
    • /
    • pp.322-329
    • /
    • 2011
  • Microsatellite markers were utilized to investigate genetic diversity among 70 Korean barley varieties (Hordeum vulgare). Ninety nine microsatellite primer pairs were screened for 9 varieties. Twenty primer pairs showed highly polymorphic. The relationship between markers genotypes and 70 varieties was analyzed. A total of 124 polymorphic amplified fragments were obtained by using 20 microsatellite markers. Two to nine SSR alleles were detected for each locus with an average of 6.2 alleles per locus. Average polymorphism information content (PIC) was 0.734, ranging from 0.498 to 0.882. A total of 124 marker loci were used to calculate Jaccard's distance coefficients for cluster analysis using UPGMA. Clustering group was divided 2 groups corresponding to 2-rowed and 6-rowed barley varieties. The phenogram was discriminated all varieties by markers genotypes. These markers may be used wide range of practical application in variety identification and genetic purity assessment of barley.

Content Recommendation Techniques for Personalized Software Education (개인화된 소프트웨어 교육을 위한 콘텐츠 추천 기법)

  • Kim, Wan-Seop
    • Journal of Digital Convergence
    • /
    • v.17 no.8
    • /
    • pp.95-104
    • /
    • 2019
  • Recently, software education has been emphasized as a key element of the fourth industrial revolution. Many universities are strengthening the software education for all students according to the needs of the times. The use of online content is an effective way to introduce SW education for all students. However, the provision of uniform online contents has limitations in that it does not consider individual characteristics(major, sw interest, comprehension, interests, etc.) of students. In this study, we propose a recommendation method that utilizes the directional similarity between contents in the boolean view history data environment. We propose a new item-based recommendation formula that uses the confidence value of association rule analysis as the similarity level and apply it to the data of domestic paid contents site. Experimental results show that the recommendation accuracy is improved than when using the traditional collaborative recommendation using cosine or jaccard for similarity measurements.

Genetic diversity of grapevine (Vitis vinifera L.) as revealed by ISSR markers

  • Basheer-Salimia, Rezq;Mujahed, Arwa
    • Journal of Plant Biotechnology
    • /
    • v.46 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • The main goal of this study was to determine the genetic diversity among 36 grape cultivars grown in Palestine by using ISSR-polymerase chain reaction (PCR) fingerprints. Among the tested primers, 17 produced reasonable amplification products with high intensity and pattern stability. A total of 57 DNA fragments (loci) separated by electrophoresis on agarose gels were detected and they ranged in size, from 150 to 900 bp. Out of these fragments, 55 (88%) were polymorphic and 2 (3.5%) monomorphic. Our results also revealed an average of 3.1 loci per primer. A minimum of 1 and maximum of 10 DNA fragments were obtained (S-17, #820 and #841) and (S-31) primers, respectively. Therefore, the later primer (S-31) is considered to be the most powerful primer among the tested ones. The genetic distance matrix showed an average distance range of between 0.05 and 0.76. The maximum genetic distance value of 0.76 (24% similarity) was exhibited between the (Shami and Marawi.Hamadani.Adi) as well as (Bairuti and Marawi.Hamadani.Adi) genotypes. On the other hand, the lowest genetic distance of 0.05 (95% similarity) was exhibited between (Jandali.Tawel.Mofarad and Jandali. Kurawi.Mlzlz) along with (Shami.Aswad and Shami.mtartash. mlwn) genotypes. Furthermore, the UPGMA dendrogram generally clusters the grape cultivars into eight major clusters in addition to an isolated genotype. Based on these figures, the cultivars tested in this study could be characterized by large divergence at the DNA level. This is taking the assumption that our region has a very rich and varied clonal grape genetic structure.

Analysis of Symptoms-Herbs Relationships in Shanghanlun Using Text Mining Approach (텍스트마이닝 기법을 이용한 『상한론』 내의 증상-본초 조합의 탐색적 분석)

  • Jang, Dongyeop;Ha, Yoonsu;Lee, Choong-Yeol;Kim, Chang-Eop
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.34 no.4
    • /
    • pp.159-169
    • /
    • 2020
  • Shanghanlun (Treatise on Cold Damage Diseases) is the oldest document in the literature on clinical records of Traditional Asian medicine (TAM), on which TAM theories about symptoms-herbs relationships are based. In this study, we aim to quantitatively explore the relationships between symptoms and herbs in Shanghanlun. The text in Shanghanlun was converted into structured data. Using the structured data, Term Frequency - Inverse Document Frequency (TF-IDF) scores of symptoms and herbs were calculated from each chapter to derive the major symptoms and herbs in each chapter. To understand the structure of the entire document, principal component analysis (PCA) was performed for the 6-dimensional chapter space. Bipartite network analysis was conducted focusing on Jaccard scores between symptoms and herbs and eigenvector centralities of nodes. TF-IDF scores showed the characteristics of each chapter through major symptoms and herbs. Principal components drawn by PCA suggested the entire structure of Shanghanlun. The network analysis revealed a 'multi herbs - multi symptoms' relationship. Common symptoms and herbs were drawn from high eigenvector centralities of their nodes, while specific symptoms and herbs were drawn from low centralities. Symptoms expected to be treated by herbs were derived, respectively. Using measurable metrics, we conducted a computational study on patterns of Shanghanlun. Quantitative researches on TAM theories will contribute to improving the clarity of TAM theories.

Comparison of the copy-neutral loss of heterozygosity identified from whole-exome sequencing data using three different tools

  • Lee, Gang-Taik;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.20 no.1
    • /
    • pp.4.1-4.8
    • /
    • 2022
  • Loss of heterozygosity (LOH) is a genomic aberration. In some cases, LOH can be generated without changing the copy number, which is called copy-neutral LOH (CN-LOH). CN-LOH frequently occurs in various human diseases, including cancer. However, the biological and clinical implications of CN-LOH for human diseases have not been well studied. In this study, we compared the performance of CN-LOH determination using three commonly used tools. For an objective comparison, we analyzed CN-LOH profiles from single-nucleotide polymorphism array data from 10 colon adenocarcinoma patients, which were used as the reference for comparison with the CN-LOHs obtained through whole-exome sequencing (WES) data of the same patients using three different analysis tools (FACETS, Nexus, and Sequenza). The majority of the CN-LOHs identified from the WES data were consistent with the reference data. However, some of the CN-LOHs identified from the WES data were not consistent between the three tools, and the consistency with the reference CN-LOH profile was also different. The Jaccard index of the CN-LOHs using FACETS (0.84 ± 0.29; mean value, 0.73) was significantly higher than that of Nexus (0.55 ± 0.29; mean value, 0.50; p = 0.02) or Sequenza (0 ± 0.41; mean value, 0.34; p = 0.04). FACETS showed the highest area under the curve value. Taken together, of the three CN-LOH analysis tools, FACETS showed the best performance in identifying CN-LOHs from The Cancer Genome Atlas colon adenocarcinoma WES data. Our results will be helpful in exploring the biological or clinical implications of CN-LOH for human diseases.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Analysis of Memory Pool Jacquard Similarity between Bitcoin and Ethereum in the Same Environment (동일한 환경에서 구성된 비트코인과 이더리움의 메모리 풀 자카드 유사도 분석)

  • Maeng, SooHoon;Shin, Hye-yeong;Kim, Daeyong;Ju, Hongtaek
    • KNOM Review
    • /
    • v.22 no.3
    • /
    • pp.20-24
    • /
    • 2019
  • Blockchain is a distributed ledger-based technology where all nodes participating in the blockchain network are connected to the P2P network. When a transaction is created in the blockchain network, the transaction is propagated and validated by the blockchain nodes. The verified transaction is sent to peers connected to each node through P2P network, and the peers keep the transaction in the memory pool. Due to the nature of P2P networks, the number and type of transactions delivered by a blockchain node is different for each node. As a result, all nodes do not have the same memory pool. Research is needed to solve problems such as attack detection. In this paper, we analyze transactions in the memory pool before solving problems such as transaction fee manipulation, double payment problem, and DDos attack detection. Therefore, this study collects transactions stored in each node memory pool of Bitcoin and Ethereum, a cryptocurrency system based on blockchain technology, and analyzes how much common transactions they have using jacquard similarity.

Deep Learning-Based Lumen and Vessel Segmentation of Intravascular Ultrasound Images in Coronary Artery Disease

  • Gyu-Jun Jeong;Gaeun Lee;June-Goo Lee;Soo-Jin Kang
    • Korean Circulation Journal
    • /
    • v.54 no.1
    • /
    • pp.30-39
    • /
    • 2024
  • Background and Objectives: Intravascular ultrasound (IVUS) evaluation of coronary artery morphology is based on the lumen and vessel segmentation. This study aimed to develop an automatic segmentation algorithm and validate the performances for measuring quantitative IVUS parameters. Methods: A total of 1,063 patients were randomly assigned, with a ratio of 4:1 to the training and test sets. The independent data set of 111 IVUS pullbacks was obtained to assess the vessel-level performance. The lumen and external elastic membrane (EEM) boundaries were labeled manually in every IVUS frame with a 0.2-mm interval. The Efficient-UNet was utilized for the automatic segmentation of IVUS images. Results: At the frame-level, Efficient-UNet showed a high dice similarity coefficient (DSC, 0.93±0.05) and Jaccard index (JI, 0.87±0.08) for lumen segmentation, and demonstrated a high DSC (0.97±0.03) and JI (0.94±0.04) for EEM segmentation. At the vessel-level, there were close correlations between model-derived vs. experts-measured IVUS parameters; minimal lumen image area (r=0.92), EEM area (r=0.88), lumen volume (r=0.99) and plaque volume (r=0.95). The agreement between model-derived vs. expert-measured minimal lumen area was similarly excellent compared to the experts' agreement. The model-based lumen and EEM segmentation for a 20-mm lesion segment required 13.2 seconds, whereas manual segmentation with a 0.2-mm interval by an expert took 187.5 minutes on average. Conclusions: The deep learning models can accurately and quickly delineate vascular geometry. The artificial intelligence-based methodology may support clinicians' decision-making by real-time application in the catheterization laboratory.