• Title/Summary/Keyword: web document clustering

Search Result 54, Processing Time 0.027 seconds

A Hybrid Document Clustering for a Web Agent (웹 에이전트를 위한 통합방식 문서 클러스터링)

  • Yang, Chan-Beom;Lee, Seong-Yeol;Park, Yeong-Taek
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.5
    • /
    • pp.422-430
    • /
    • 2001
  • 웹 에이전트는 사용자가 웹을 브라우징하는 행위를 모니터하여 사용자의 관심 정보를 학습하고 사용자가 필요로 하는 웹 상의 정보를 자동 제공하는 지능형 시스템이다. 웹 에이전트가 사용자의 선호도를 학습하기 위해서는 귀납적 기계학습을 수행하는데, 이때 학습의 효율을 높이기 위해서는 사용자가 관심있어하는 문서들을 유사한 문서들로 클러스터링하여 학습 시스템에 제공하여야 한다. 본 논문에서는 웹 에이전트의 학습 시스템에 입력되는 학습대상 문서들을 보다 정확하고 효율적으로 클러스터링하여 제공하기 위해서 Top-down 방식과 Bottom-up 방식을 통합 적용한 통합방식 문서 클러스터링과 초기 클러스터 생성을 위한 평가함수를 제시한다. Top-down 방식으로는 개념적 클러스터링 알고리즘인 COBWEB을 적용하고, Bottom-up 방식으로는 교차기반(Intersection-based) 클러스터링 방식인 Etzioni의 클러스터링 알고리즘을 적용하였다.

  • PDF

Resampling Feedback Documents Using Overlapping Clusters (중첩 클러스터를 이용한 피드백 문서의 재샘플링 기법)

  • Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.247-256
    • /
    • 2009
  • Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Implementation of a Web Document Clustering System Using Word2Vec (Word2Vec을 이용한 웹 문서 클러스터링 시스템 구현)

  • Yi, Hyun Seok;Ahn, Sung Hun;Lee, Yong Hwan;Cheon, Myung Jae;Park, Hyeok Ju;Park, Mee Hwa;Lee, Yong Kyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.26-29
    • /
    • 2016
  • 웹 문서 추천 시스템에서는 유사한 내용의 문서임에도 불구하고 URL이 달라서 다른 문서로 인식하여 사용자에게 추천하는 데이터 희소성 문제가 있다. 여기서 기존 연구들은 이 문제에 대한 해결 방법으로 TF-IDF를 이용하였으나 비용 및 시간의 한계가 있으며 유의어 분류 문제가 있다. 본 논문에서는 Word2Vec을 이용한 웹문서 학습 시스템을 통해 문제를 해결한다. 제안 시스템은 언론사의 뉴스를 수집하고 이를 정형화된 형식으로 분석하여 가공하는 전처리 과정을 거친 후 Word2Vec 학습을 통해 문서 벡터를 생성하고 이를 K-Means 클러스터링으로 유사 문서군으로 분류한다. 이 시스템을 이용하면 데이터 희소성 문제를 해결할 뿐만 아니라 연산량이 TF-IDF에 비해 줄어들고 유의어 분류 시 유사도가 높아지는 강점이 있다.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.