• Title/Summary/Keyword: 질의어 확장

Search Result 168, Processing Time 0.021 seconds

An Electronic Dictionary Structure supporting Truncation Search (절단검색을 지원하는 전자사전 구조)

  • 김철수
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.1
    • /
    • pp.60-69
    • /
    • 2003
  • In an Information Retrieval System(IRS) based on an inverted file as a file structure it is possible to retrieve related documents when the searcher know the complete words of searching fields. however, there are many cases in which the searcher may not know the complete words but a partial string of words with which to search. In this case, if the searcher can search indexes that include the known partial string, it is possible to retrieve related documents. Futhermore, when the retrieved documents are few, we need a method to find all documents having indexes which include known the partial string. To satisfy these requests, the searcher should be able to construct a query formulation that uses the term truncation method. Also the IRS should have an electronic dictionary that can support a truncated search term. This paper designs and implements an electronic dictionary(ED) structure to support a truncation search efficiently. The ED guarantees very fast and constant searching time for searching a term entry and the inversely alphabetized entry of it, regardless of the number of inserted words. In order to support a truncation search efficiently, we use the Trie structure and in order to accommodate fast searching time we use a method using array. In the searching process of a truncated term, we can reduce the searching time by minimizing the length of string to be expanded.

KNetIRS : Information Retrieval System using Keyword Network (KNetIRS : 키워드망을 이용한 정보검색 시스템)

  • Woo, Sun-Mi;Yoo, Chun-Sik;Lee, Chong-Deuk;Kim, Yong-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2185-2196
    • /
    • 1997
  • The existing information retrieval systems utilize thesaurus in order to search and retrieve the desired information even when the query is not accurate. However the cost for implementing and maintaining thesaurus is very high and it can not guarantee complete success of search/retrieval operation. Thus in this paper, Information Retrieval System using Keyword Network(KNetIRS) which was designed and implemented to solve these problem is introduced. Keyword Network composed of keywords which were extracted from documents. KNetIRS finds the appropriate documents by using the Keyword Network which is based on the concept of "inverted file". In addition, KNetIRS can carry out query expansion by using the Keyword Network Browser, and deal with the conjunction of "정보 검색", "정보", and "검색", by defining and implementing spilt function.

  • PDF

Applying Ontologies to UCI for the Efficient Search and Management of Digital Contents (디지털 콘텐츠의 효율적 검색과 관리를 위한 UCI 식별체계의 온톨로지 적용)

  • Ha, Eun-Ok;Kim, Yoon-Ho
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.215-228
    • /
    • 2009
  • UCI(Universal Content Identifier), a digital content identifier is a identification system based on URN(Uniform Resource Name) for the transparent distribution, efficient search and management of digital contents. Digital contents given a UCI number need metadata for serving a exact contents that users need. However, the only metadata provided by UCI are deficient to represent various information of digital contents, and ontologies which could standardize and define explicitly semantic relations between metadata (provided by UCI) are needed for better representation of information and efficient search and management of digital contents. In this paper, we extended concept-relation between metadata with ontology and designed domain ontology enabling semantic-based search by expanding UCI meta data, and showed the more efficient search and management over the metadata by UCI through various queries.

  • PDF

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Vibro ordalii, the causative agent of massive mortality in cultured rockfish(Sebastes schlegeli) larvae (양식 조피볼락(Sebastes schlegeli) 치어의 대량폐사 원인인 비브리오병에 관하여)

  • Park, Sung-Woo;Kim, Young-Gill;Choi, Dong-Lim
    • Journal of fish pathology
    • /
    • v.9 no.2
    • /
    • pp.137-145
    • /
    • 1996
  • A specific disease syndrome, which led to massive mortality on larve of rockfish(Sebastes schlegeli) in marine hatcheries at Chungnam area during the period 1995~1996 was studied. The causative agent isolated from the diseased or dead larvae was identified as Vibrio ordalii on the basis of biochemical and biological characteristics. In the experimental challenges aganist 0 and 1 summer fish conducted at two different temperatures as $18^{\circ}C$ and $25^{\circ}C$, Vibrio ordalii showed higher virulence to no summer fish at $18^{\circ}C$ than 1-summer fish at $25^{\circ}C$. These results were consistent to field data obtained during epizootic outbreaks in the farms. Moribund and died larvae presented telangiectasis of secondary gill lamella and brain, dissecting of respiratory epithelium, atrophy of hepatic cells and necrosis of kidney associated with the presence of the bacteria. But the digestive tissue of these fish showed no significant change.

  • PDF

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Management Strategies and the Growth Stages Analysis of Local Festival : Cases of Hampyeong Butterfly Festival and Hwacheon Sancheoneo Ice Festival (지역축제의 성장단계별분석과 관리전략 : 함평나비축제와 화천산천어축제를 중심으로)

  • Kim, Hyeonwook
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.537-549
    • /
    • 2015
  • The purpose of this study is to analyze the characteristics of two local festivals, already recognized as a successful regional cultural festival, over a period of time to apply the product life cycle theory. As a result of the analysis of the two festivals, in introduction stage, the festival organizers have focused mainly on settling down the festival's core programs and raising awareness on the subject of the festival for the stimulation for basic demands. Second, for maintaining increasing demands, the qualitative improvement of the core program, the development of new programs and the expansion of the programs for the visitors' convenience and safety were focused on. In addition, strategies for promoting awareness of the festival had modified the strategy to promote on the contents and programs of the festival, as well as public relations strategy, not only domestic but also the foreign countries, was established and fulfilled. Lastly, in maturity stage, to overcome declining the number of visitors and economic effect both festivals have showed providing sophisticated programs for the visitors' convenience and safety, improving service quality through the development of the existing programs, providing economic benefits such as admission cuts or giving gift certificates and expanding number of foreign visitors with strengthening the promotion that was implemented in growth stage and enhancing the better image of the festival through the social contribution. Therefore, strategies for the each stage mentioned above present the significant policy implications for festival organizers who were planning to establish a new festival or implementing a festival with experiencing the tepid growth.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.