• Title/Summary/Keyword: topic distillation

Search Result 2, Processing Time 0.015 seconds

An Experimental Study on Topic Distillation Using Web Site Structure (웹 사이트 구조를 이용한 토픽 검색 연구)

  • Lee, Jee-Suk;Chung, Yung-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.3
    • /
    • pp.201-218
    • /
    • 2007
  • This study proposes a topic distillation algorithm that ranks the relevant sites selected from retrieved web pages, and evaluates the performance of the algorithm. The algorithm calculates the topic score of a site using its hierarchical structure. The TREC .GOV test collection and a set of TREC-2004 queries for topic distillation task are used for the experiment. The experimental results showed the algorithm returned at least 2 relevant sites in top ten retrieval results. We peformed an in-depth analysis of the relevant sites list provided by TREC-2004 to find out that the definition of topic distillation was not strictly applied in selecting relevant sites. When we re-evaluated the retrieved sites/sub-sites using the revised list of relevant sites, the performance of the proposed algorithm was improved significantly.

Novel Category Discovery in Plant Species and Disease Identification through Knowledge Distillation

  • Jiuqing Dong;Alvaro Fuentes;Mun Haeng Lee;Taehyun Kim;Sook Yoon;Dong Sun Park
    • Smart Media Journal
    • /
    • v.13 no.7
    • /
    • pp.36-44
    • /
    • 2024
  • Identifying plant species and diseases is crucial for maintaining biodiversity and achieving optimal crop yields, making it a topic of significant practical importance. Recent studies have extended plant disease recognition from traditional closed-set scenarios to open-set environments, where the goal is to reject samples that do not belong to known categories. However, in open-world tasks, it is essential not only to define unknown samples as "unknown" but also to classify them further. This task assumes that images and labels of known categories are available and that samples of unknown categories can be accessed. The model classifies unknown samples by learning the prior knowledge of known categories. To the best of our knowledge, there is no existing research on this topic in plant-related recognition tasks. To address this gap, this paper utilizes knowledge distillation to model the category space relationships between known and unknown categories. Specifically, we identify similarities between different species or diseases. By leveraging a fine-tuned model on known categories, we generate pseudo-labels for unknown categories. Additionally, we enhance the baseline method's performance by using a larger pre-trained model, dino-v2. We evaluate the effectiveness of our method on the large plant specimen dataset Herbarium 19 and the disease dataset Plant Village. Notably, our method outperforms the baseline by 1% to 20% in terms of accuracy for novel category classification. We believe this study will contribute to the community.