• Title/Summary/Keyword: Semantic Expansion

Search Result 71, Processing Time 0.027 seconds

Alleviating Semantic Term Mismatches in Korean Information Retrieval (한국어 정보 검색에서 의미적 용어 불일치 완화 방안)

  • Yun, Bo-Hyun;Park, Sung-Jin;Kang, Hyun-Kyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3874-3884
    • /
    • 2000
  • An information retrieval system has to retrieve all and only documents which are relevant to a user query, even if index terms and query terms are not matched exactly. However, term mismatches between index terms and qucry terms have been a serious obstacle to the enhancement of retrieval performance. In this paper, we discuss automatic term normalization between words in text corpora and their application to a Korean information retrieval system. We perform two types of term normalizations to alleviate semantic term mismatches: equivalence class and co-occurrence cluster. First, transliterations, spelling errors, and synonyms are normalized into equivalence classes bv using contextual similarity. Second, context-based terms are normalized by using a combination of mutual information and word context to establish word similarities. Next, unsupervised clustering is done by using K-means algorithm and co-occurrence clusters are identified. In this paper, these normalized term products are used in the query expansion to alleviate semantic tem1 mismatches. In other words, we utilize two kinds of tcrm normalizations, equivalence class and co-occurrence cluster, to expand user's queries with new tcrms, in an attempt to make user's queries more comprehensive (adding transliterations) or more specific (adding spc'Cializationsl. For query expansion, we employ two complementary methods: term suggestion and term relevance feedback. The experimental results show that our proposed system can alleviatl' semantic term mismatches and can also provide the appropriate similarity measurements. As a result, we know that our system can improve the rctrieval efficiency of the information retrieval system.

  • PDF

Experiments using query expansion in LSI (LSI에서 질의 확장을 이용한 실험)

  • 안성수;김동주;이기영;김한우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.151-153
    • /
    • 1999
  • 한번의 질의로 사용자가 모든 요구를 표현하기 어렵고 만족시킬 수 없기 때문에 질의를 확장하는 연구가 계속되고 있다. 본 논문에서는 LSI(Latent Semantic Indexing)에서 사용자의 질의와 의미공간에서의 용어들간의 유사도를 구해 최상위의 용어들을 순서를 정해 질의확장을 하는 방법과 LCA(Local Context Analysis)을 이용하는 방법을 제안한다. 그리고 문서 집합에 대해 3가지 가중치를 적용한 결과를 분석하고 질의확장시의 문제점과 향후 연구과제에 대해 설명한다.

  • PDF

Proposal for Semantic Digital Archive for UNESCO Intangible Cultural Heritage Sites List: Centering on User-Centric Relational Facet Navigation (유네스코 무형문화유산 시맨틱 디지털 아카이브 구축: 이용자 중심 관계형 패싯 네비게이션을 중심으로)

  • Park, Sun-hee
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.19 no.4
    • /
    • pp.63-86
    • /
    • 2019
  • UNESCO clearly has a good user interface compared to other sites. However, it does not have a structure in which user-centric knowledge curating is employed by users. As such, the knowledge structure should be expressed differently in advance for users to enjoy such benefits. At present, almost all current information systems are lacking with semantic and contextual information. Moreover, these systems are deemed insufficient of interlinking various kinds of thoughts in our minds. Thus, it is necessary to model in advance what users are likely to think and provide an interface that they can easily utilize based on that modeling. Furthermore, there is a need for a new structural theory based on semantic technology that can make that possible. Therefore, in this proposal, theoretical and practical insights were presented for user interface implementation to which relational facet navigation based on the structural theory is applied. Moreover, this proposal intends to suggest a "thinking expansion platform" that allows users' ideation of different concepts, including those unfamiliar to them.

A Query Expansion Technique using Query Patterns in QA systems (QA 시스템에서 질의 패턴을 이용한 질의 확장 기법)

  • Kim, Hea-Jung;Bu, Ki-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.1
    • /
    • pp.1-8
    • /
    • 2007
  • When confronted with a query, question answering systems endeavor to extract the most exact answers possible by determining the answer type that fits with the key terms used in the query. However, the efficacy of such systems is limited by the fact that the terms used in a query may be in a syntactic form different to that of the same words in a document. In this paper, we present an efficient semantic query expansion methodology based on query patterns in a question category concept list comprised of terms that are semantically close to terms used in a query. The proposed system first constructs a concept list for each question type and then builds the concept list for each question category using a learning algorithm. The results of the present experiments suggest the promise of the proposed method.

  • PDF

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

A Semantic Analysis of the Indeterminacy in Contemporary Fashion - Focusing on Fashion Since 2000 - (현대 패션에 나타난 불확정성의 의미해석 - 2000년대 이후 패션을 중심으로 -)

  • Hwang, Hye-Jin;Kim, Min-Ja
    • Journal of the Korean Society of Costume
    • /
    • v.62 no.5
    • /
    • pp.1-15
    • /
    • 2012
  • In a fast changing postmodern society, contemporary fashion is becoming more complicated and ambiguous along with other genres of art than ever before. This phenomenon reigning as a sociocultural paradigm can be defined as 'indeterminacy' and it means 'undecidability'. The purpose of this study is to clarify and analyze the indeterminate characteristics of contemporary fashion reviewing the theoretical background and the architectural formativeness as a comparative research. The core idea of deconstructivism dismantles a causal relationship between function and form in fashion and the conventional notion about clothes. Complexity theory, which is the study of chaotic dynamical systems, suggests the creative idea and concept of infinite possibilities on a formative method. Meanwhile, catastrophe theory of discontinuous change can be used as interpretative strategies for the process of deconstruction and reconstruction. As a result of this study, the indeterminacy of fashion can be analyzed into five semantic categories: irregularity, immateriality, randomness, complexity and changeability. The intrinsic value of the indeterminacy in contemporary fashion is the interaction with a sociocultural ideology and a technological environment as well as an expansion of formative expression. To conclude, it can be said that the indeterminacy in fashion is a new interpretation of the relationship among body and space, clothes and society.

Intelligent Product Search Agent based on SWRL (시맨틱 웹 규칙 언어를 이용한 지능형 상품 정보 검색 에이전트 개발)

  • Kim, U-Ju;Kim, Jeong-Myeong;Choe, Dae-U
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.316-320
    • /
    • 2005
  • We developed Intelligent Product Search Agent based on SWRL, and this agent can search product information with knowledge(facts and rules) on the web, implement price comparison for searched products considering delivery rates. Existing keyword based product search engines is poor at searching intent products though a user has already prefect knowledge about intent produces. Furthermore if a user has insufficient knowledge, it is impossible to implement search. Also, existing price comparison shopping mall gives users comparison service considering total price(product prices, taxes, delivery rates), this service is valid to single product and has limitations of system expansion and up-dating because of not rule base but programming base. If there is appropriate knowledge on the Semantic web and this makes product information retrieval possible, above problems can be solved clearly. In this research, we developed Intelligent Product Search Agent based on SWRL that can search product information efficiently by making agent to handle facts and rules by itself.

  • PDF

Semantic Information Retrieval Based on User-Word Intelligent Network (U-WIN 기반의 의미적 정보검색 기술)

  • Im, Ji-Hui;Choi, Ho-Seop;Ock, Cheol-Young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.547-550
    • /
    • 2006
  • The criterion which judges an information retrieval system performance is to how many accurately retrieve an information that the user wants. The search result which uses only homograph has been appears the various documents that relates to each meaning of the word or intensively appears the documents that relates to specific meaning of it. So in this paper, we suggest semantic information retrieval technique using relation within User-Word Intelligent Network(U-WIN) to solve a disambiguation of query In our experiment, queries divide into two classes, the homograph used in terminology and the general homograph, and it sets the expansion query forms at "query + hypemym". Thus we found that only web document search's precision is average 73.5% and integrated search's precision is average 70% in two portal site. It means that U-WIN-Based semantic information retrieval technique can be used efficiently for a IR system.

  • PDF

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

Question Analysis and Expansion based on Semantics (의미 기반의 질의 분석 및 확장)

  • Shin, Seung-Eun;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.7
    • /
    • pp.50-59
    • /
    • 2007
  • This paper describes a question analysis and expansion based on semantics for on efficient information retrieval. Results of all information retrieval systems include many non-relevant documents because the index cannot naturally reflect the contents of documents and because queries used in information retrieval systems cannot represent enough information in user's question. To solve this problem, we analyze user's question semantically, determine the answer type, and extract semantic features. And then we expand user's question using them and syntactic structures which are used to represent the answer. Our similarity is to rank documents which include expanded queries in high position. Especially, we found that an efficient document retrieval is possible by a question analysis and expansion based on semantics on natural language questions which are comparatively short but fully expressing the information demand of users.