• Title/Summary/Keyword: 유사도 질의

Search Result 1,858, Processing Time 0.028 seconds

A New Collaborative Filtering Method for Movie Recommendation Using Genre Interest (영화 추천을 위한 장르 흥미도를 이용한 새로운 협력 필터링 방식)

  • Lee, Soojung
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.329-335
    • /
    • 2014
  • Collaborative filtering has been popular in commercial recommender systems, as it successfully implements social behavior of customers by suggesting items that might fit to the interests of a user. So far, most common method to find proper items for recommendation is by searching for similar users and consulting their ratings. This paper suggests a new similarity measure for movie recommendation that is based on genre interest, instead of differences between ratings made by two users as in previous similarity measures. From extensive experiments, the proposed measure is proved to perform significantly better than classic similarity measures in terms of both prediction and recommendation qualities.

Efficient Inverted List Search Technique using Bitmap Filters (비트맵 필터를 이용한 효율적인 역 리스트 탐색 기법)

  • Kwon, In-Teak;Kim, Jong-Ik
    • The KIPS Transactions:PartD
    • /
    • v.18D no.6
    • /
    • pp.415-422
    • /
    • 2011
  • Finding similar strings is an important operation because textual data can have errors, duplications, and inconsistencies by nature. Many algorithms have been developed for string approximate searches and most of them make use of inverted lists to find similar strings. These algorithms basically perform merge operations on inverted lists. In this paper, we develop a bitmap representation of an inverted list and propose an efficient search algorithm that can skip unnecessary inverted lists without searching using bitmap filters. Experimental results show that the proposed technique consistently improve the performance of the search.

Alleviating Syntactic Term Mismatches in Korean Information Retrieval (한국어정보검색에서 구문적 용어불일치 완화방안)

  • Yun, Bo-Hyun;Kim, Sang-Bum;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.143-149
    • /
    • 1998
  • 한국어 정보검색에서 복합명사와 명사구로 발생하는 색인어와 질의어간의 구문적 용어 불일치는 많은 문제를 일으켜왔다. 본 논문에서는 복합명사 분해와 명사구 정규화를 함께 수행하여 유사도 측정값을 적당히 유지함으로써 재현율을 저하시키지 않고서 정확률을 향상시킬 수 있는 구문적 용어불일치 완화방안을 제시하고자 한다 색인모듈에서는 통계정보를 이용하여 복합명사를 분해하고, 의존관계를 이용하여 명사구를 정규화한다. 분해되고 정규화된 키워드에 경계정보 '/'가 할당되고, 가중치가 계산된다. 검색모듈에서는 경계정보를 이용하여 부분일치를 고려하는 유사도 계산을 수행한다. KTSET 2.0으로 실험한 결과, 제안한 방법은 구문적 용어불일치를 완화할 수 있으며, 재현율을 저하시키지 않고서 정확률을 향상시킬 수 있음을 보인다.

  • PDF

Design and Implementation of Video Retrievaling System for Effective Ultrasonograph (효과적인 초음파검사를 위한 동화상 검색시스탬 설계 및 구현)

  • 오태석;오무송
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.6
    • /
    • pp.79-84
    • /
    • 1998
  • 초음파 진단장치는 X선 촬영과 달리 인체에 해를 주지 않고 계속적으로 장시간 관 찰할 수 있고 실시간으로 영상을 볼 수 있으며, 또 타장비에 비해 가격이 저렴하고 소형이 라는 장점이 있다. 현재는 이 영상들을 대용량 저장매체에 저장되어 컴퓨터를 통해 재생하 여 볼 수 있게 되었다. 본 논문에서는 이러한 막대한 양의 영상데이터를 검색하기 위한 새 로운 검색방법을 제시한다. 제안하는 검색방법은 검색을 원하는 정지화상의 컬러이미지를 제시하면 시스템이 이를 자동으로 분석하여 이미지 데이터베이스에 저장된 유사한 이미지데 이터들과 관련된 정보들을 질의결과로 나타내어 쉽게 검색하고자 한다. 이를 위하여 사용자 가 제시한 정지화상을 Bitmap으로 구성하고, Bitmap전체의 비디오 메모리에서 검색할 부분 영역을 검색대상으로 설정한다. 이 값을 key값으로 우선적으로 여기에 원하는 유사비를 설 정한 후 전체 동화상의 각 프레임에서 추출한 비디오 메모리 데이터와 검색 화면의 비디오 메모리를 Pixel별로 비교하여, 사용자가 원하는 영상데이터의 위치point 값과 유사비율값을 보관한다. point값으로 보관된 것을 유사비율에 따라 우선 순위를 정하여 데이터베이스에 보 관하고 이 보관된 후보 이미지들을 순위별로 화면에 나타내어 사용자가 원하는 이미지데이 터를 쉽고 빠르게 검색할 수 있었다.

  • PDF

Various Paraphrase Generation Using Sentence Similarity (문장 유사도를 이용한 다양한 표현의 패러프레이즈 생성)

  • Park, Da-Sol;Chang, Du-Seong;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.576-581
    • /
    • 2021
  • 패러프레이즈란 어떤 문장을 같은 의미를 가지는 다른 단어들을 사용하여 표현한 것들을 의미한다. 이는 정보 검색, 다중 문서 요약, 질의응답 등 여러 자연어 처리 분야에서 중요한 역할을 한다. 특히, 양질의 패러프레이즈 코퍼스를 얻는 것은 많은 시간 및 비용이 소요된다. 이러한 문제점을 해소하기 위해 본 논문에서는 문장 유사도를 이용한 패러프레이즈 쌍을 구축하고, 또 구축한 패러프레이즈 쌍을 이용하여 기계 학습을 통해 새로운 패러프레이즈을 생성한다. 제안 방식으로 생성된 패러프레이즈 쌍은 기존의 구축되어 있는 코퍼스 내 나타나는 표현들로만 구성된 페러프레이즈 쌍이라는 단점이 존재한다. 이러한 단점을 해소하기 위해 기계 학습을 이용한 실험을 진행하여 새로운 표현에 대한 후보군을 추출하는 방법을 적용하여 새로운 표현이라고 볼 수 있는 후보군들을 추출하여 기존의 코퍼스 내 새로운 표현들이 생성된 것을 확인할 수 있었다.

  • PDF

Query Expansion based on Word Sense Community (유사 단어 커뮤니티 기반의 질의 확장)

  • Kwak, Chang-Uk;Yoon, Hee-Geun;Park, Seong-Bae
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1058-1065
    • /
    • 2014
  • In order to assist user's who are in the process of executing a search, a query expansion method suggests keywords that are related to an input query. Recently, several studies have suggested keywords that are identified by finding domains using a clustering method over the documents that are retrieved. However, the clustering method is not relevant when presenting various domains because the number of clusters should be fixed. This paper proposes a method that suggests keywords by finding various domains related to the input queries by using a community detection algorithm. The proposed method extracts words from the top-30 documents of those that are retrieved and builds communities according to the word graph. Then, keywords representing each community are derived, and the represented keywords are used for the query expansion method. In order to evaluate the proposed method, we compared our results to those of two baseline searches performed by the Google search engine and keyword recommendation using TF-IDF in the search results. The results of the evaluation indicate that the proposed method outperforms the baseline with respect to diversity.

Concept-based Question Analysis for Accurate Answer Extraction (정확한 해답 추출을 위한 개념 기반의 질의 분석)

  • Shin, Seung-Eun;Kang, Yu-Hwan;Ahn, Young-Min;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.1
    • /
    • pp.10-20
    • /
    • 2007
  • This paper describes a concept-based question analysis to analyze concept which is more important than keyword for the accurate answer extraction. Our idea is that we can extract correct answers from various paragraphs with different structures when we use well-defined concepts because concepts occurred in questions of same answer type are similar. That is, we will analyze the syntactic and semantic role of each word or phrase in a question in order to extract more relevant documents and more accurate answer in them. For each answer type, we define a concept frame which is composed of concepts commonly occurred in that type of questions and analyze user's question by filling a concept frame with a word or phrase. Empirical results show that our concept-based question analysis can extract more accurate answer than any other conventional approach. Also, concept-based approach has additional merits that it is language universal model, and can be combined with arbitrary conventional approaches.

Analysis of Korean Patent & Trademark Retrieval Query Log to Improve Retrieval and Query Reformulation Efficiency (질의로그 데이터에 기반한 특허 및 상표검색에 관한 연구)

  • Lee, Jee-Yeon;Paik, Woo-Jin
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.61-79
    • /
    • 2006
  • To come up with the recommendations to improve the patent & trademark retrieval efficiency, 100,016 patent & trademark search requests by 17,559 unique users over a period of 193 days were analyzed. By analyzing 2,202 multi-query sessions, where one user issuing two or more queries consecutively, we discovered a number of retrieval efficiency improvements clues. The session analysis result also led to suggestions for new system features to help users reformulating queries. The patent & trademark retrieval users were found to be similar to the typical web users in certain aspects especially in issuing short queries. However, we also found that the patent & trademark retrieval users used Boolean operators more than the typical web search users. By analyzing the multi-query sessions, we found that the users had five intentions in reformulating queries such as paraphrasing, specialization, generalization, alternation, and interruption, which were also used by the web search engine users.

VRTEC : Multi-step Retrieval Model for Content-based Video Query (VRTEC : 내용 기반 비디오 질의를 위한 다단계 검색 모델)

  • 김창룡
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.1
    • /
    • pp.93-102
    • /
    • 1999
  • In this paper, we propose a data model and a retrieval method for content-based video query After partitioning a video into frame sets of same length which is called video-window, each video-window can be mapped to a point in a multidimensional space. A video can be represented a trajectory by connection of neighboring video-window in a multidimensional space. The similarity between two video-windows is defined as the euclidean distance of two points in multidimensional space, and the similarity between two video segments of arbitrary length is obtained by comparing corresponding trajectory. A new retrieval method with filtering and refinement step if developed, which return correct results and makes retrieval speed increase by 4.7 times approximately in comparison to a method without filtering and refinement step.

  • PDF

Range Subsequence Matching under Dynamic Time Warping (DTW 거리를 지원하는 범위 서브시퀀스 매칭)

  • Han, Wook-Shin;Lee, Jin-Soo;Moon, Yang-Sae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.6
    • /
    • pp.559-566
    • /
    • 2008
  • In this paper, we propose a range subsequence matching under dynamic time warping (DTW) distance. We exploit Dual Match, which divides data sequences into disjoint windows and the query sequence into sliding windows. However, Dual Match is known to work under Euclidean distance. We argue that Euclidean distance is a fragile distance, and thus, DTW should be supported by Dual Match. For this purpose, we derive a new important theorem showing the correctness of our approach and provide a detailed algorithm using the theorem. Extensive experimental results show that our range subsequence matching performs much better than the sequential scan algorithm.