• Title/Summary/Keyword: 색인어 정규화

Search Result 7, Processing Time 0.019 seconds

Retrieval-based Chat Model using Index-Term Normalization and Answer Filtering (색인어 정규화 및 응답 필터링을 이용한 검색기반 채팅 모델)

  • Lee, Hyeon-gu;Kim, Minkyoung;Kim, Jintae;Kim, Harksoo;Lee, Yeonsoo;Choi, Maengsik
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.197-200
    • /
    • 2017
  • 채팅 모델은 인간과 컴퓨터가 신변잡기 대화를 나눌 수 있게 해주는 시스템으로 빠른 속도로 발전하는 인공지능 음성언어 비서 시스템에 필수적으로 사용되는 기술이다. 본 논문에서는 검색기반 채팅 모델에서 발생하는 검색 효율 문제와 정확하지 못한 답변을 출력하는 문제를 해결하기 위해 색인어 정규화와 응답 필터링이 적용된 검색기반 채팅 모델을 제안한다. 색인어 정규화를 통해 99.3%의 색인 커버리지를 확보하였으며 필터링 모델을 통해 기존 검색 모델에서보다 향상된 사용자 만족도를 얻었다.

  • PDF

Retrieval-based Chat Model using Index-Term Normalization and Answer Filtering (색인어 정규화 및 응답 필터링을 이용한 검색기반 채팅 모델)

  • Lee, Hyeon-gu;Kim, Minkyoung;Kim, Jintae;Kim, Harksoo;Lee, Yeonsoo;Choi, Maengsik
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.197-200
    • /
    • 2017
  • 채팅 모델은 인간과 컴퓨터가 신변잡기 대화를 나눌 수 있게 해주는 시스템으로 빠른 속도로 발전하는 인공지능 음성언어 비서 시스템에 필수적으로 사용되는 기술이다. 본 논문에서는 검색기반 채팅 모델에서 발생하는 검색 효율 문제와 정확하지 못한 답변을 출력하는 문제를 해결하기 위해 색인어 정규화와 응답 필터링이 적용된 검색기반 채팅 모델을 제안한다. 색인어 정규화를 통해 99.3%의 색인 커버리지를 확보하였으며 필터링 모델을 통해 기존 검색 모델에서보다 향상된 사용자 만족도를 얻었다.

  • PDF

A Study on the Extraction and Utilization of Index from Bibliographic MARC Database (서지마크 데이터베이스로부터의 색인어 추출과 색인어의 검색 활용에 관한 연구 - 경북대학교 도서관 학술정보시스템 사례를 중심으로 -)

  • Park Mi-Sung
    • Journal of Korean Library and Information Science Society
    • /
    • v.36 no.2
    • /
    • pp.327-348
    • /
    • 2005
  • The purpose of this study is to emphasize the importance of index definition and to prepare the basis of optimal index in bibliographic retrieval system. For the purpose, this research studied a index extraction theory on index tag definition and index normalization from the bibliographic marc database and analyzed a retrieval utilization rate of extracted index. In this experiment, we divided index between text-type and code-type about the generated 29,219,853 indexes from 2,200,488 bibliographic records and analyzed utilization rate by the comparison of index-type and index term of web logs. According to the result, the text-type indexes such as title, author, publication, subject are showed high utilization rate while the code-type indexes were showed low utilization rate. So this study suggests that the unused index is removed from index definition to optimize index.

  • PDF

Alleviating Semantic Term Mismatches in Korean Information Retrieval (한국어 정보 검색에서 의미적 용어 불일치 완화 방안)

  • Yun, Bo-Hyun;Park, Sung-Jin;Kang, Hyun-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3874-3884
    • /
    • 2000
  • An information retrieval system has to retrieve all and only documents which are relevant to a user query, even if index terms and query terms are not matched exactly. However, term mismatches between index terms and qucry terms have been a serious obstacle to the enhancement of retrieval performance. In this paper, we discuss automatic term normalization between words in text corpora and their application to a Korean information retrieval system. We perform two types of term normalizations to alleviate semantic term mismatches: equivalence class and co-occurrence cluster. First, transliterations, spelling errors, and synonyms are normalized into equivalence classes bv using contextual similarity. Second, context-based terms are normalized by using a combination of mutual information and word context to establish word similarities. Next, unsupervised clustering is done by using K-means algorithm and co-occurrence clusters are identified. In this paper, these normalized term products are used in the query expansion to alleviate semantic tem1 mismatches. In other words, we utilize two kinds of tcrm normalizations, equivalence class and co-occurrence cluster, to expand user's queries with new tcrms, in an attempt to make user's queries more comprehensive (adding transliterations) or more specific (adding spc'Cializationsl. For query expansion, we employ two complementary methods: term suggestion and term relevance feedback. The experimental results show that our proposed system can alleviatl' semantic term mismatches and can also provide the appropriate similarity measurements. As a result, we know that our system can improve the rctrieval efficiency of the information retrieval system.

  • PDF

Alleviating Syntactic Term Mismatches in Korean Information Retrieval (한국어정보검색에서 구문적 용어불일치 완화방안)

  • Yun, Bo-Hyun;Kim, Sang-Bum;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.143-149
    • /
    • 1998
  • 한국어 정보검색에서 복합명사와 명사구로 발생하는 색인어와 질의어간의 구문적 용어 불일치는 많은 문제를 일으켜왔다. 본 논문에서는 복합명사 분해와 명사구 정규화를 함께 수행하여 유사도 측정값을 적당히 유지함으로써 재현율을 저하시키지 않고서 정확률을 향상시킬 수 있는 구문적 용어불일치 완화방안을 제시하고자 한다 색인모듈에서는 통계정보를 이용하여 복합명사를 분해하고, 의존관계를 이용하여 명사구를 정규화한다. 분해되고 정규화된 키워드에 경계정보 '/'가 할당되고, 가중치가 계산된다. 검색모듈에서는 경계정보를 이용하여 부분일치를 고려하는 유사도 계산을 수행한다. KTSET 2.0으로 실험한 결과, 제안한 방법은 구문적 용어불일치를 완화할 수 있으며, 재현율을 저하시키지 않고서 정확률을 향상시킬 수 있음을 보인다.

  • PDF

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.

A study on the improving and constructing the content for the Sijo database in the Period of Modern Enlightenment (계몽기·근대시조 DB의 개선 및 콘텐츠화 방안 연구)

  • Chang, Chung-Soo
    • Sijohaknonchong
    • /
    • v.44
    • /
    • pp.105-138
    • /
    • 2016
  • Recently with the research function, "XML Digital collection of Sijo Texts in the Period of Modern Enlightenment" DB data is being provided through the Korean Research Memory (http://www.krm.or.kr) and the foundation for the constructing the contents of Sijo Texts in the Period of Modern Enlightenment has been laid. In this paper, by reviewing the characteristics and problems of Digital collection of Sijo Texts in the Period of Modern Enlightenment and searching for the improvement, I tried to find a way to make it into the content. This database has the primary meaning in the integrating and glancing at the vast amounts of Sijo in the Period of Modern Enlightenment to reaching 12,500 pieces. In addition, it is the first Sijo data base which is provide the variety of search features according to literature, name of poet, title of work, original text, per period, and etc. However, this database has the limits to verifying the overall aspects of the Sijo in the Period of Modern Enlightenment. The title and original text, which is written in the archaic word or Chinese character, could not be searched, because the standard type text of modern language is not formatted. And also the works and the individual Sijo works released after 1945 were missing in the database. It is inconvenient to extract the datum according to the poet, because poets are marked in the various ways such as one's real name, nom de plume and etc. To solve this kind of problems and improve the utilization of the database, I proposed the providing the standard type text of modern language, giving the index terms about content, providing the information on the work format and etc. Furthermore, if the Sijo database in the Period of Modern Enlightenment which is prepared the character of the Sijo Culture Information System could be built, it could be connected with the academic, educational contents. For the specific plan, I suggested as follow, - learning support materials for the Modern history and the national territory recognition on the Modern Age - source materials for studying indigenous animals and plants characters creating the commercial characters - applicability as the Sijo learning tool such as Sijo Game.

  • PDF