• 제목/요약/키워드: Query Frequency

검색결과 123건 처리시간 0.019초

RFM기법과 k-means 기법을 이용한 개인화 추천시스템의 개발 (Development of Personalized Recommendation System using RFM method and k-means Clustering)

  • 조영성;구미숙;류근호
    • 한국컴퓨터정보학회논문지
    • /
    • 제17권6호
    • /
    • pp.163-172
    • /
    • 2012
  • 기존 추천시스템의 명시적((Explicit) 협력 필터링 방법은 실용화 되었으나 정확한 아이템의 속성이 반영되지 않는 문제와 희박성과 확장성 문제가 여전히 남아 있다. 본 논문에서는 실시간성과 민첩성이 요구되는 유비쿼터스 상거래에서 고객에게 번거로운 질의 응답 과정이 없이 묵시적인(Implicit) 방법을 이용하여 RFM(Recency, Frequency, Monetary)기법과 k-means 기법을 이용한 개인화 추천시스템을 제안한다. 구매 가능성이 높은 아이템을 추출하기 위해서 고객데이터와 구매이력 데이터를 기반으로 아이템의 속성 반영이 가능한 RFM기법과 k-means 클러스터링을 이용한다. 제안 방법으로 추천의 효율성이 높은 아이템 추천이 가능하도록 고객정보의 속성 변수의 특징 벡터가 적용된 클러스터링 작업과 군집내의 아이템 카테고리 선호도 계산 작업의 전처리를 수행한다. 성능평가를 위해 현업에서 사용하는 인터넷 화장품 아이템 쇼핑몰의 데이터를 기반으로 데이터 셋을 구성하여 기존 시스템과 비교 실험을 통해 성능을 평가하여 효용성과 타당성을 입증하였다.

분산 컴포넌트 명세를 통한 e-비즈니스 컴포넌트 구축 (The e-Business Component Construction based on Distributed Component Specification)

  • 김행곤;최하정;한은주
    • 정보처리학회논문지D
    • /
    • 제8D권6호
    • /
    • pp.705-714
    • /
    • 2001
  • 오늘날의 컴퓨팅 시스템은 인터넷을 사용하여 비즈니스 거래와 분산 업무 처리로 확대되어가고 있으며 정보 기술은 점차적 으로 재사용성과 독립성 그리고 이식성을 가진 컴포넌트를 기반으로 한 응용 개발이 확산되고 있다. 컴포넌트 개발 형태는 코드의 재사용이나 클래스 라이브러리보다 좀 더 발전된 형태의 부품개발 형태로서, CBD(Component Based Development)를 기초로 한다. 그러나, CBD를 이용하여 새로운 컴포넌트를 구축하는 비용의 증가와 함께 비즈니스 요구사항에 맞는 컴포넌트 개발을 위한 노력이 필요하다. 또한 빠르고 정확한 컴포넌트 정보를 웹 상에서 지원할 수 있도록 시스템 측면에서 정규화 형태의 컴포넌트 모델이 요구되고 있다. 본 논문에서는 사용자의 요구사항에 접근하고 웹 상에서 빠르고 신속하게 어플리케이션이 개발되는데 목적을 두고 있다. 네트워크상에서 비즈니스 도메인을 기반한 가장 소규모 단위의 분산 컴포넌트를 대상으로 인터페이스 명세를 제공한다. 컴포넌트 내부와 외부 관계를 담고 있는 명세는 사용자의 요구 사항을 정확하게 분석되도록 구성하며 이러한 명세는 비즈니스 도메인에서 재사용 가능한 정보 크기인 EJB(EnterpriseJavaBean)로 서블릿 시스템 내에서 세션과 엔티티 형태의 정보로 나누어 저장된다. 비즈니스 컴포넌트를 제공하기 위한 질의를 사용하여 비즈니스 컴포넌트를 이용할 수 있으며, 시스템은 차후에 등록, 자동 재배치, 조회, 테스트, 그리고 다운로드하여 컴포넌트를 제공받을 수 있는 환경 구축을 목표하며 이는 컴포넌트 재사용성을 증대시키며 비용을 절감하고 사용자가 분산 컴포넌트를 쉽게 사용할 수 있도록 하는데 목적을 둔다.

  • PDF

키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법 (A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model)

  • 조원진;노상규;윤지영;박진수
    • Asia pacific journal of information systems
    • /
    • 제21권1호
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.