• Title/Summary/Keyword: document search

Search Result 382, Processing Time 0.02 seconds

Research on Function and Policy for e-Government System using Semantic Technology (전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구)

  • Jang, Young-Cheol
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.13 no.5
    • /
    • pp.22-28
    • /
    • 2008
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression. Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using existing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

Design and Implementation of Tag Coupling-based Boolean Query Matching System for Ranked Search Result (태그결합을 이용한 불리언 검색에서 순위화된 검색결과를 제공하기 위한 시스템 설계 및 구현)

  • Kim, Yong;Joo, Won-Kyun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.101-121
    • /
    • 2012
  • Since IR systems which adopt only Boolean IR model can not provide ranked search result, users have to conduct time-consuming checking process for huge result sets one by one. This study proposes a method to provide search results ranked by using coupling information between tags instead of index weight information in Boolean IR model. Because document queries are used instead of general user queries in the proposed method, key tags used as queries in a relevant document are extracted. A variety of groups of Boolean queries based on tag couplings are created in the process of extracting queries. Ranked search result can be extracted through the process of matching conducted with differential information among the query groups and tag significance information. To prove the usability of the proposed method, the experiment was conducted to find research trend analysis information on selected research information. Aslo, the service based on the proposed methods was provided to get user feedback for a year. The result showed high user satisfaction.

Design of a RDF Metadata System for the Searching of Application Programs (응용프로그램의 검색을 위한 RDF 메타데이터 시스템의 설계)

  • Yoo Weon-Hee;Kouh Hoon-Joon
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.6
    • /
    • pp.1-9
    • /
    • 2005
  • As the amount of data on the web increase, it is difficult to search what we want exactly. Therefore, much researches are attempted to search web resources efficiently. So, W3C established the standard that give meanings to resources on the web using RDF metadata. The RDF metadata had been mainly described a document data on the web. But it is difficult to create automatically the metadata for application programs than the document data. This paper proposes a method to use RDF metadata to search application programs. Firstly, we define RDF data model that stores the information of the application programs and RDF schema that references the RDF data model. And we design a prototype system to search application programs. This system meets expectation, getting the application to fullfill the needs of user, and has the efficiency of the searching function.

  • PDF

Exploiting Query Proximity and Graph Profiling Method for Tag-based Personalized Search in Folksonomy (질의어의 근접성 정보 및 그래프 프로파일링 기법을 이용한 태그 기반 개인화 검색)

  • Han, Keejun;Jang, Jincheul;Yi, Mun Yong
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1117-1125
    • /
    • 2014
  • Folksonomy data, which is derived from social tagging systems, is a useful source for understanding a user's intention and interest. Using the folksonomy data, it is possible to create an accurate user profile which can be utilized to build a personalized search system. However there are limitations in some of the traditional methods such as Vector Space Model(VSM) for user profiling and similarity computation. This paper suggests a novel method with graph-based user and document profile which uses the proximity information of query terms to improve personalized search. We demonstrate the performance of the suggested method by comparing its performance with several state-of-the-art VSM based personalization models in two different folksonomy datasets. The results show that the proposed model constantly outperforms the other state-of-the-art personalization models. Furthermore, the parameter sensitivity results show that the proposed model is parameter-free in that it is not affected by the idiosyncratic nature of datasets.

A Streaming XML Parser Supporting Adaptive Parallel Search (적응적 병렬 검색을 지원하는 스트리밍 XML 파서)

  • Lee, Kyu-Hee;Han, Sang-Soo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1851-1856
    • /
    • 2013
  • An XML is widely used for web services, such as SOAP(Simple Object Access Protocol) and REST (Representational State Transfer), and also de facto standard for representing data. Since the XML parser using DOM(Document Object Model) requires a preprocessing task creating a DOM-tree, and then storing it into memory, embedded systems with limited resources typically employ a streaming XML parser without preprocessing. In this paper, we propose a new architecture for the streaming XML parser using an APSearch(Adaptive Parallel Search) on FPGA(Field Programmable Gate Array). Compared to other approaches, the proposed APSearch parser dramatically reduces overhead on the software side and achieves about 2.55 and 2.96 times improvement in the time needed for an XML parsing. Therefore, our APSearch parser is suitable for systems to speed up XML parsing.

Development of a Regulatory Q&A System for KAERI Utilizing Document Search Algorithms and Large Language Model (거대언어모델과 문서검색 알고리즘을 활용한 한국원자력연구원 규정 질의응답 시스템 개발)

  • Hongbi Kim;Yonggyun Yu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.31-39
    • /
    • 2023
  • The evolution of Natural Language Processing (NLP) and the rise of large language models (LLM) like ChatGPT have paved the way for specialized question-answering (QA) systems tailored to specific domains. This study outlines a system harnessing the power of LLM in conjunction with document search algorithms to interpret and address user inquiries using documents from the Korea Atomic Energy Research Institute (KAERI). Initially, the system refines multiple documents for optimized search and analysis, breaking the content into managable paragraphs suitable for the language model's processing. Each paragraph's content is converted into a vector via an embedding model and archived in a database. Upon receiving a user query, the system matches the extracted vectors from the question with the stored vectors, pinpointing the most pertinent content. The chosen paragraphs, combined with the user's query, are then processed by the language generation model to formulate a response. Tests encompassing a spectrum of questions verified the system's proficiency in discerning question intent, understanding diverse documents, and delivering rapid and precise answers.

Symmetric Searchable Encryption with Efficient Conjunctive Keyword Search

  • Jho, Nam-Su;Hong, Dowon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1328-1342
    • /
    • 2013
  • Searchable encryption is a cryptographic protocol for searching a document in encrypted databases. A simple searchable encryption protocol, which is capable of using only one keyword at one time, is very limited and cannot satisfy demands of various applications. Thus, designing a searchable encryption with useful additional functions, for example, conjunctive keyword search, is one of the most important goals. There have been many attempts to construct a searchable encryption with conjunctive keyword search. However, most of the previously proposed protocols are based on public-key cryptosystems which require a large amount of computational cost. Moreover, the amount of computation in search procedure depends on the number of documents stored in the database. These previously proposed protocols are not suitable for extremely large data sets. In this paper, we propose a new searchable encryption protocol with a conjunctive keyword search based on a linked tree structure instead of public-key based techniques. The protocol requires a remarkably small computational cost, particularly when applied to extremely large databases. Actually, the amount of computation in search procedure depends on the number of documents matched to the query, instead of the size of the entire database.

Improving the Performance of Web Search using Query Types (질의유형에 기반한 웹 검색의 성능 향상)

  • Kang, In-Ho;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.11B no.5
    • /
    • pp.537-544
    • /
    • 2004
  • The Web is rich with various sources of information. Due to the massive and heterogeneous web document collections, users want to find various types of target pages. Each type of information for Web search has designated queries. If a user query is not a designated query, then we cannot have good result documents. Different strategies are needed to utilize the goodness of each type of information for a search engine. If we know the property of information, then we can refine candidate pages and rank them delicately. Various experiments are conducted to show the properties of each type of information. Therefore, we show an appropriate combining formula to utilize the properties of each type of information. In addition, for a service finding task, we propose Service Link Information that utilizes the existence of mechanisms for a user interaction.

An Integrated Database of Engineering Documents and CAD/CAE Information for the Support of Bridge Maintenance (교량 유지관리 지원을 위한 CAD/CAE 정보와 엔지니어링 문서정보의 통합 데이터베이스)

  • Jeong Y.S.;Kim B.G.;Lee S.H.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.11 no.3
    • /
    • pp.183-196
    • /
    • 2006
  • A new operation strategy. which can guarantee the data consistency of engineering information among the various intelligent information systems, is presented for engineering information of bridges, and construction methodology of integrated database is developed to support the strategy. The two core standard techniques are adopted to construct the integrated database. One is the Standard for the Exchange of Product Model Data (STEP) for CAD/CAE information and the other is the Extensible Markup Language(XML) for engineering document information. The former enabler structural engineers to handle the structural details with three-dimensional geometry-based information of bridges, and ACIS solid modeling kernel is employed to develop AutoCAD based application modules. The latter can make document files into data type for web-based application modules which assist end-users to search and retrieve engineering document data. In addition, relaying algorithm is developed to integrate the two different information, e.g. CAD/CAE information and engineering document information. The pilot application modules are also developed, and a case study subjected to the Han-Nam bridge is presented at the end of the paper to illustrate the use of the developed application modules.

A Study on the DB-IR Integration: Per-Document Basis Online Index Maintenance

  • Jin, Du-Seok;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.3
    • /
    • pp.275-280
    • /
    • 2009
  • While database(DB) and information retrieval(IR) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care, customer support, XML data management, and digital libraries. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, In the DB-IR integrations, a transaction on adding or updating a document should include maintenance of the posting lists accompanied by the document. Although DB-IR integration has been budded in the research filed, the issue will remain difficult and rewarding areas for a while. One of the primary reasons is lack of efficient online transactional index maintenance. In this paper, performance of a few strategies for per-document basis transactional index maintenance - direct index update, pulsing auxiliary index and posting segmentation index - will be evaluated. The result shows that the pulsing auxiliary strategy and posting segmentation indexing scheme, can be a challenging candidates for text field indexing in DB-IR integration.