• Title/Summary/Keyword: Query processing

Search Result 1,423, Processing Time 0.03 seconds

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Smart Browser based on Semantic Web using RFID Technology (RFID 기술을 이용한 시맨틱 웹 기반 스마트 브라우저)

  • Song, Chang-Woo;Lee, Jung-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.12
    • /
    • pp.37-44
    • /
    • 2008
  • Data entered into RFID tags are used for saving costs and enhancing competitiveness in the development of applications in various industrial areas. RFID readers perform the identification and search of hundreds of objects, which are tags. RFID technology that identifies objects on request of dynamic linking and tracking is composed of application components supporting information infrastructure. Despite their many advantages, existing applications, which do not consider elements related to real.time data communication among remote RFID devices, cannot support connections among heterogeneous devices effectively. As different network devices are installed in applications separately and go through different query analysis processes, there happen the delays of monitoring or errors in data conversion. The present study implements a RFID database handling system in semantic Web environment for integrated management of information extracted from RFID tags regardless of application. Users’ RFID tags are identified by a RFID reader mounted on an application, and the data are sent to the RFID database processing system, and then the process converts the information into a semantic Web language. Data transmitted on the standardized semantic Web base are translated by a smart browser and displayed on the screen. The use of a semantic Web language enables reasoning on meaningful relations and this, in turn, makes it easy to expand the functions by adding modules.

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

Synthetic Trajectory Generation Tool for Indoor Moving Objects (실내공간 이동객체 궤적 생성기)

  • Ryoo, Hyung Gyu;Kim, Soo Jin;Li, Ki Joune
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.24 no.4
    • /
    • pp.59-66
    • /
    • 2016
  • For the performance experiments of databases systems with moving object databases, we need moving object trajectory data sets. For example, benchmark data sets of moving object trajectories are required for experiments on query processing of moving object databases. For those reasons, several tools have been developed for generating moving objects in Euclidean spaces or road network spaces. Indoor space differs from outdoor spaces in many aspects and moving object generator for indoor space should reflect these differences. Even some tools were developed to produce virtual moving object trajectories in indoor space, the movements generated by them are not realistic. In this paper, we present a moving object generation tool for indoor space. First, this tool generates trajectories for pedestrians in an indoor space. And it provides a parametric generation of trajectories considering not only speed, number of pedestrians, minimum distance between pedestrians but also type of spaces, time constraints, and type of pedestrians. We try to reflect the patterns of pedestrians in indoor space as realistic as possible. For the reason of interoperability, several geospatial standards are used in the development of the tool.

Design and Development of Middleware for Clinical Trial System based on Brain MR Image (뇌 MR 영상기반 임상연구 시스템을 위한 미들웨어 설계 및 개발)

  • Jeon, Woong-Gi;Park, Kyoung-Jong;Lee, Young-Seung;Choi, Hyun-Ju;Jeong, Sang-Wook;Kim, Dong-Eog;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.6
    • /
    • pp.805-813
    • /
    • 2012
  • In this paper, we have designed and developed a middleware for an effectively approaching database to the existed brain disease clinical research system. The brain disease clinical research system was consisted of two parts i.e., a register and an analyzer. Since the register collects the registration data the analyzer yields a statistical data which based on the diverse variables. The middleware has designed to database management and a large data query processing of clients. By separating the function of each feature as a module, the module which was weakened connectivity between functionalities has been implemented the re-use module. And image data module used a new compression method from image to text for an effective management and storage in database. We tested the middleware system using 700 actual clinical medical data. As a result, the total data transmission time was improved maximum 115 times faster than the existing one. Through the improved module structures, it is possible to provide a robust and reliable system operation and enhanced security functionality. In the future, these middleware importances should be increased to the large medical database constructions.

ECoMOT : An Efficient Content-based Multimedia Information Retrieval System Using Moving Objects' Trajectories in Video Data (ECoMOT : 비디오 데이터내의 이동체의 제적을 이용한 효율적인 내용 기반 멀티미디어 정보검색 시스템)

  • Shim Choon-Bo;Chang Jae-Woo;Shin Yong-Won;Park Byung-Rae
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.47-56
    • /
    • 2005
  • A moving object has a various features that its spatial location, shape, and size are changed as time goes. In addition, the moving object has both temporal feature and spatial feature. It is one of the highly interested feature information in video data. In this paper, we propose an efficient content-based multimedia information retrieval system, so tailed ECoMOT which enables user to retrieve video data by using a trajectory information of moving objects in video data. The ECoMOT includes several novel techniques to achieve content-based retrieval using moving objects' trajectories : (1) Muitiple trajectory modeling technique to model the multiple trajectories composed of several moving objects; (2) Multiple similar trajectory retrieval technique to retrieve more similar trajectories by measuring similarity between a given two trajectories composed of several moving objects; (3) Superimposed signature-based trajectory indexing technique to effectively search corresponding trajectories from a large trajectory databases; (4) convenient trajectory extraction, query generation, and retrieval interface based on graphic user interface

An Efficient Web Search Method Based on a Style-based Keyword Extraction and a Keyword Mining Profile (스타일 기반 키워드 추출 및 키워드 마이닝 프로파일 기반 웹 검색 방법)

  • Joo, Kil-Hong;Lee, Jun-Hwl;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1049-1062
    • /
    • 2004
  • With the popularization of a World Wide Web (WWW), the quantity of web information has been increased. Therefore, an efficient searching system is needed to offer the exact result of diverse Information to user. Due to this reason, it is important to extract and analysis of user requirements in the distributed information environment. The conventional searching method used the only keyword for the web searching. However, the searching method proposed in this paper adds the context information of keyword for the effective searching. In addition, this searching method extracts keywords by the new keyword extraction method proposed in this paper and it executes the web searching based on a keyword mining profile generated by the extracted keywords. Unlike the conventional searching method which searched for information by a representative word, this searching method proposed in this paper is much more efficient and exact. This is because this searching method proposed in this paper is searched by the example based query included content information as well as a representative word. Moreover, this searching method makes a domain keyword list in order to perform search quietly. The domain keyword is a representative word of a special domain. The performance of the proposed algorithm is analyzed by a series of experiments to identify its various characteristic.

Efficient Rotation-Invariant Boundary Image Matching Using the Envelope-based Lower Bound (엔빌로프 기반 하한을 사용한 효율적인 회전-불변 윤곽선 이미지 매칭)

  • Kim, Sang-Pil;Moon, Yang-Sae;Hong, Sun-Kyong
    • The KIPS Transactions:PartD
    • /
    • v.18D no.1
    • /
    • pp.9-22
    • /
    • 2011
  • In this paper we present an efficient solution to rotation?invariant boundary image matching. Computing the rotation-invariant distance between image time-series is a time-consuming process since it requires a lot of Euclidean distance computations for all possible rotations. In this paper we propose a novel solution that significantly reduces the number of distance computations using the envelope-based lower bound. To this end, we first present how to construct a single envelope from a query sequence and how to obtain a lower bound of the rotation-invariant distance using the envelope. We then show that the single envelope-based lower bound can reduce a number of distance computations. This approach, however, may cause bad performance since it may incur a larger lower bound by considering all possible rotated sequences in a single envelope. To solve this problem, we present a concept of rotation interval, and using the rotation interval we generalize the envelope-based lower bound by exploiting multiple envelopes rather than a single envelope. We also propose equi-width and envelope minimization divisions as the method of determining rotation intervals in the multiple envelope approach. Experimental results show that our envelope-based solutions outperform existing solutions by one or two orders of magnitude.

Improving Performance of Search Engine By Using WordNet-based Collaborative Evaluation and Hyperlink (워드넷 기반 협동적 평가와 하이퍼링크를 이용한 검색엔진의 성능 향상)

  • Kim, Hyun-Gil;Kim, Jun-Tae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.369-380
    • /
    • 2004
  • In this paper, we propose a web page weighting scheme based on WordNet-based collaborative evaluation and hyperlink to improve the precision of web search engine. Generally search engines use keyword matching to decide web page ranking. In the information retrieval from huge data such as the Web, simple word comparison cannot distinguish important documents because there exist too many documents with similar relevancy. In this paper, we implement a WordNet-based user interface that helps to distinguish different senses of query word, and constructed a search engine in which the implicit evaluations by multiple users are reflected in ranking by accumulating the number of clicks. In accumulating click counts, they are stored separately according to lenses, so that more accurate search is possible. Weighting of each web page by using collaborative evaluation and hyperlink is reflected in ranking. The experimental results with several keywords show that the precision of proposed system is improved compared to conventional search engines.

Long-term Location Data Management for Distributed Moving Object Databases (분산 이동 객체 데이타베이스를 위한 과거 위치 정보 관리)

  • Lee, Ho;Lee, Joon-Woo;Park, Seung-Yong;Lee, Chung-Woo;Hwang, Jae-Il;Nah, Yun-Mook
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.2 s.17
    • /
    • pp.91-107
    • /
    • 2006
  • To handling the extreme situation that must manage positional information of a very large volume, at least millions of moving objects. A cluster-based sealable distributed computing system architecture, called the GALIS which consists of multiple data processors, each dedicated to keeping records relevant to a different geographical zone and a different time zone, was proposed. In this paper, we proposed a valid time management and time-zone shifting scheme, which are essential in realizing the long-term location data subsystem of GALIS, but missed in our previous prototype development. We explain how to manage valid time of moving objects to avoid ambiguity of location information. We also describe time-zone shifting algorithm with three variations, such as Real Time-Time Zone Shifting, Batch-Time Zone Shifting, Table Partitioned Batch-Time Zone Shifting, Through experiments related with query processing time and CPU utilization, we show the efficiency of the proposed time-zone shifting schemes.

  • PDF