• 제목/요약/키워드: semantic document-retrieval

Search Result 59, Processing Time 0.028 seconds

구문 및 의미 분석을 통한 한국어 자동 색인

  • 최기선
    • Journal of the Korean Society for information Management
    • /
    • v.8 no.2
    • /
    • pp.96-107
    • /
    • 1991
  • The inherent limitation of the conventional approaches in automatic indexing lies in the fact that they compute the relevancy between index terms and documents rather indirectly or relatively. As an alternative the anlaysis of document texts seeks a means of establishing a direct relevancy of the terms. More rigorous linguistic analysis will ensure better chance of relevancy. Various semantic topologies among terms may suggest the sufficient quality for relevancy. The enhanced and guaranteed relevance will allow the high precision of retrieval. Along with this line the on going project in KAIST pursues the user oriented retrieval system that spawns still may other issues that are not c o m n in traditional perspective.

  • PDF

A Method on Associated Document Recommendation with Word Correlation Weights (단어 연관성 가중치를 적용한 연관 문서 추천 방법)

  • Kim, Seonmi;Na, InSeop;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.250-259
    • /
    • 2019
  • Big data processing technology and artificial intelligence (AI) are increasingly attracting attention. Natural language processing is an important research area of artificial intelligence. In this paper, we use Korean news articles to extract topic distributions in documents and word distribution vectors in topics through LDA-based Topic Modeling. Then, we use Word2vec to vector words, and generate a weight matrix to derive the relevance SCORE considering the semantic relationship between the words. We propose a way to recommend documents in order of high score.

A Methodology of the Information Retrieval System Using Fuzzy Connection Matrix and Document Connectivity Order (색인어 퍼지 관계와 서열기법을 이용한 정보 검색 방법론)

  • Kim, Chul;Lee, Seung-Chai;Kim, Byung-Ki
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.5
    • /
    • pp.1160-1169
    • /
    • 1996
  • In this study, an experiment of information retrieval using fuzzy connection matrix of keywords was conducted. A query for retrieval was constructed from each keyword and Boolean operator such as AND, OR, NOT. In a workstation environment, the performance of the fuzzy retrieval system was proved to be considerably effective than that of the system using the crisp set theory. And both recall ratio and precision ratio showed that the proposed technique would be a possible alternative in future information retrieval. Some special features of this experimental system were ; ranking the results in the order of connectivity, making the retrieval results correspond flexibly by changing the threshold value, trying to accord the retrieval process with the retrieval semantics by treating the averse-connectivity (fuzzy value) as a semantic approximation between kewords.

  • PDF

Similarity checking between XML tags through expanding synonym vector (유사어 벡터 확장을 통한 XML태그의 유사성 검사)

  • Lee, Jung-Won;Lee, Hye-Soo;Lee, Ki-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.676-683
    • /
    • 2002
  • The success of XML(eXtensible Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general mislead the task of classifying and clustering XML documents precisely. In this paper, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML tags which are represented different tags.

A Ranking Technique of XML Documents using Path Similarity for Expanded Query Processing (확장된 질의 처리를 위해 경로간 의미적 유사도를 고려한 XML 문서 순위화 기법)

  • Kim, Hyun-Joo;Park, So-Mi;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.113-120
    • /
    • 2010
  • XML is broadly using for data storing and processing. XML is specified its structural characteristic and user can query with XPath when information from data document is needed. XPath query can process when the tern and structure of document and query is matched with each other. However, nowadays there are lots of data documents which are made by using different terminology and structure therefore user can not know the exact idea of target data. In fact, there are many possibilities that target data document has information which user is find or a similar ones. Accordingly user query should be processed when their term usage or structural characteristic is slightly different with data document. In order to do that we suggest a XML document ranking method based on path similarity. The method can measure a semantic similarity between user query and data document using three steps which are position, node and relaxation factors.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Question Analysis and Expansion based on Semantics (의미 기반의 질의 분석 및 확장)

  • Shin, Seung-Eun;Park, Hee-Guen;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.7
    • /
    • pp.50-59
    • /
    • 2007
  • This paper describes a question analysis and expansion based on semantics for on efficient information retrieval. Results of all information retrieval systems include many non-relevant documents because the index cannot naturally reflect the contents of documents and because queries used in information retrieval systems cannot represent enough information in user's question. To solve this problem, we analyze user's question semantically, determine the answer type, and extract semantic features. And then we expand user's question using them and syntactic structures which are used to represent the answer. Our similarity is to rank documents which include expanded queries in high position. Especially, we found that an efficient document retrieval is possible by a question analysis and expansion based on semantics on natural language questions which are comparatively short but fully expressing the information demand of users.

Semantic Search System using Ontology-based Inference (온톨로지기반 추론을 이용한 시맨틱 검색 시스템)

  • Ha Sang-Bum;Park Yong-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.3
    • /
    • pp.202-214
    • /
    • 2005
  • The semantic web is the web paradigm that represents not general link of documents but semantics and relation of document. In addition it enables software agents to understand semantics of documents. We propose a semantic search based on inference with ontologies, which has the following characteristics. First, our search engine enables retrieval using explicit ontologies to reason though a search keyword is different from that of documents. Second, although the concept of two ontologies does not match exactly, can be found out similar results from a rule based translator and ontological reasoning. Third, our approach enables search engine to increase accuracy and precision by using explicit ontologies to reason about meanings of documents rather than guessing meanings of documents just by keyword. Fourth, domain ontology enables users to use more detailed queries based on ontology-based automated query generator that has search area and accuracy similar to NLP. Fifth, it enables agents to do automated search not only documents with keyword but also user-preferable information and knowledge from ontologies. It can perform search more accurately than current retrieval systems which use query to databases or keyword matching. We demonstrate our system, which use ontologies and inference based on explicit ontologies, can perform better than keyword matching approach .

A Case Study on the Types of Queries' Relations for Recognizing User intention (검색의도 파악을 위한 질의어 관계유형에 관한 사례연구)

  • Kwon, Soon-Jin;Kim, Won-Il;Yoo, Seong-Joon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.4
    • /
    • pp.414-422
    • /
    • 2011
  • IR (Information Retrieval) systems have the methods that compare relationships between query and index to identify document that may be fit to the user's query keyword. However, the methods usually ignore the importance of relations that are not expressed in the query. Therefore, in this study, we describe how to refine the queries' relation from keyword and to reveal the hidden intent. A useful relationship between query and keyword in IR wth studied and we classified the tion fromrelation. Firstfromall, we did researchmrelated on semantic relationship and ontolhiical researchmin foreign and domestic research, and also analyzed semantic network practices, information retrieval technolhiy, extracted and classified the tion fromrelationships s' relasite's real-world datamin whichminformation retrieval technolhiin fare applied. Next, we souiht to solve the problems occurred frequently i' relasituation that searchers tioically face. I' relacurrent search technolhiy, the mesh searchmresult fare poured by simply comparn ina query with index terms. Therefore, the need for an intelligent search fittn inusers' intent is required. The relationships between two queries to re hiddee and identify relasearcher's intent have to be revealed. By analyzn inthe practical cthes s' queries and classifyn inthem into nine kind fromrelationship tion, we proposed the method to design relation revealn inand role namn i, and we have also illustrated limitations of that methods.

A Comparative Study of XML and HTML: Focusing on Their Characteristics and Retrieval Functions (디지털도서관 문서양식으로서의 XML과 HTML의 특성 및 검색 기능 비교 연구)

  • 김현희;장혜원
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.2
    • /
    • pp.105-134
    • /
    • 1999
  • For efficient and precise searches in the Web environment, resources should be coded in a structured way. HTML does not cover semantic structure because of its fixed tagging. XML, which has emerged as an alternative standard markuplanguage, uses custom tags that allow structural searching. Therefore, this study aims to compare XML with HTML in terms of their characteristics and retrieval functions. In order to test retrieval functions of XML- and HTML-based systems, we constructed an experimental XML-based system. The XML-based system has several advantages over the HTML system. However, some improvements are needed to make the XML system more comprehensive and effective. First, XML document search engines with user-friendly interfaces are needed. Second, popular Web browsers such as Explorer and Communicator need to support XML 1.0 specification completely. Third, Open DTD format, which will allow information retrieval systems to retrieve documents and compress them into one single format, is also needed to control Web documents more efficiently.

  • PDF