Search | Korea Science

Design and frnplernentation of a Query Processing Algorithm for Dtstributed Semistructlred Documents Retrieval with Metadata hterface (메타데이타 인터페이스를 이용한 분산된 반구조적 문서 검색을 위한 질의처리 알고리즘 설계 및 구현)

Choe Cuija;Nam Young-Kwang
- Journal of KIISE:Software and Applications
- /
- v.32 no.6
- /
- pp.554-569
- /
- 2005
In the semistructured distributed documents, it is very difficult to formalize and implement the query processing system due to the lack of structure and rule of the data. In order to precisely retrieve and process the heterogeneous semistructured documents, it is required to handle multiple mappings such as 1:1, 1:W and W:1 on an element simultaneously and to generate the schema from the distributed documents. In this paper, we have proposed an query processing algorithm for querying and answering on the heterogeneous semistructured data or documents over distributed systems and implemented with a metadata interface. The algorithm for generating local queries from the global query consists of mapping between g1oba1 and local nodes, data transformation according to the mapping types, path substitution, and resolving the heterogeneity among nodes on a global input query with metadata information. The mapping, transformation, and path substitution algorithms between the global schema and the local schemas have been implemented the metadata interface called DBXMI (for Distributed Documents XML Metadata Interface). The nodes with the same node name and different mapping or meanings is resolved by automatically extracting node identification information from the local schema automatically. The system uses Quilt as its XML query language. An experiment testing is reported over 3 different OEM model semistructured restaurant documents. The prototype system is developed under Windows system with Java and JavaCC compiler.
PDF KSCI

An Experimental Research on the Design Characteristics and Performance of the Entity-Relationship Model (개체관계 모형의 설계 특성과 성과에 관한 실험적 연구)

정일주
- The Journal of Information Technology and Database
- /
- v.6 no.2
- /
- pp.45-57
- /
- 1999
This paper makes an attempt to find a systematic portion of the database design process, especially using the Entity-Relationship(E-R) model. Basically, we consider three aspects as a systematic portion of the database design process. They are, the strategy that a designer selects to design an E-R diagram, designer's cognitive style, and the knowledge and preference of the database designer. An experiment has been carried out in order to verify the systematic relationship between above-mentioned three aspects and the E-R modeling performance. The target system is a professional baseball system. A normative E-R diagram was constructed based upon 48 E-R diagrams produced during the experiment. The ANOVA process has been used to analyze the results. It has been found that there exist significant differences in query-answering capacity and the completeness of the E-R model among design methods. Individual differences in cognitive styles has not been found to be significantly related to the modeling performance.
PDF

The Approximate Query Answering Method in Multi-dimensional Data Cube (다차원 데이터큐브의 근사 질의응답 기법)

Lee, Sun-Young;Kim, Yeong-Ju;Bae, Woo-Sik;Lee, Jong-Yun
- Proceedings of the KAIS Fall Conference
- /
- 2009.12a
- /
- pp.445-448
- /
- 2009
DSS 응용들의 대용량 집계 데이터 집중 시스템에서는 효율적이고 즉각적인 의사결정 지원을 위한 근사 질의응답의 연구가 필요하다. 따라서 본 연구에서는 FCM 클러스터링 기법과 ANFIS을 이용한 기법을 제안한다. 제안된 기법은 다차원 데이터 큐브의 데이터 특성을 가지며 질의에 대한 근사적인 응답을 제공할 수 있는 모델을 생성한다. 제안된 기법을 통해 학습된 모델은 기존의 기법보다 근사 질의응답의 정확성이 향상되었음을 비교 실험을 통하여 확인한다. 따라서 제안된 기법은 기존의 기법보다 저장 공간과 시간을 줄일 수 있으며 또한 근사 응답의 정확도를 향상시킬 수 있다.
PDF

A Query Classification Method for Question Answering on a Large-Scale Text Data (대규모 문서 데이터 집합에서 Q&A를 위한 질의문 분류 기법)

엄재홍;장병탁
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.253-255
- /
- 2000
어떠한 질문에 대한 구체적 해답을 얻고 싶은 경우, 일반적인 정보 검색이 가지는 문제점은 검색 결과가 사용자가 찾고자 하는 답이라 하기 보다는 해답을 포함하는(또는 포함하지 않는) 문서의 집합이라는 점이다. 사용자가 후보문서를 모두 읽을 필요 없이 빠르게 원하는 정보를 얻기 위해서는 검색의 결과로 문서집합을 제시하기 보다는 실제 원하는 답을 제공하는 시스템의 필요성이 대두된다. 이를 위해 기존의 TF-IDF(Term Frequency-Inversed Document Frequency)기반의 정보검색의 방삭에 자연언어처리(Natural Language Processing)를 이용한 질문의 분류와 문서의 사전 표지(Tagging)를 사용할 수 있다. 본 연구에서는 매년 NIST(National Institute of Standards & Technology)와 DARPA(Defense Advanced Research Projects Agency)주관으로 열리는 TREC(Text REtrieval Conference)중 1999년에 열린 TREC-8의 사용자의 질문(Question)에 대한 답(Answer)을 찾는 ‘Question & Answer’문제의 실험 환경에서 질문을 특징별로 분류하고 검색 대상의 문서에 대한 사전 표지를 이용한 정보검색 시스템으로 사용자의 질문(Question)에 대한 해답을 보다 정확하고 효율적으로 제시할 수 있음을 실험을 통하여 보인다.
PDF

A Study on Work Semantic Categories for Natural Language Question Type Classification and Answer Extraction (자연어 질의유형 판별과 응답 추출을 위한 어휘 의미 체계에 관한 연구)

Yoon Sung-Hee
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.5 no.6
- /
- pp.539-545
- /
- 2004
For question answering system that extracts an answer and output to user‘s natural language question, a process of question type classification from user’s natural language query is very important. This paper proposes a question and answer type classifier using the interrogatives and word semantic categories instead of complicated classifying rules and huge dictionaries. Synonyms and postfix information are also used for question type classification. Experiments show that the semantic categories are helpful for question type classifying without interrogatives.
PDF

An Intelligent Web Service for Ontology-Based Query-Answering (온톨로지 기반의 질의-응답을 위한 지능형 웹서비스)

Jin, Hoon;Kim, In-Cheol
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07b
- /
- pp.640-642
- /
- 2005
본 논문에서는 온톨로지 기반의 질의-응답을 위한 지능형 웹서비스에 관해 기술하고자 한다. 이 웹서비스는 질의 에이전트와 응답 에이전트 간의 OWL-QL 메시지 교환에 의해서 이루어진다. OWL-QL은 OWL 언어로 표현된 지식베이스를 이용하는 시맨틱 웹 에이전트들 간의 질의-응답 처리를 위한 정형화된 언어이며, 프로토콜이다. OWL-QL에서 응답 에이전트는 질의 에이전트로부터 주어진 질의에 대한 응답처리를 위해 자동화된 추론을 전개한다. 본 논문에서는 시스템을 구성하는 각 에이전트들의 기능과 구조에 관해 설명하고, 질의 에이전트 내에 포함된 그래픽 기반의 OWL-QL 질의 작성기의 유용성에 관해 설명한다.
PDF

Semantic Query Expansion based on a Question Category Concept List in QA system (질의 응답 시스템에서 질의 카테고리별 개념리스트 구축에 기반한 의미적 질의 확장)

김혜정;강보영;박성배;이상조
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10a
- /
- pp.178-180
- /
- 2004
질의 응답(Question Answering) 시스템은 질의에서 요구하는 정답 유형(Answer tyype) 및 질의에 사용된 용어를 적용하여 보다 정확한 답을 추출하고자 한다. 그러나 질의에 사용된 용어들이 문서의 정답문장에 그대로 사용되지 않고 같은 의미의 다른 어휘로 출현하기도 하며, 혹은 다른 문법적 정보를 가진 카테고리로 등장하여 정답 추출에 어려움이 따른다. 따라서, 본 논문은 질의별 카테고리 개념 리스트를 구축하여 효과적인 의미적 질의 확장 방법론을 제안한다. 제안된 방법은 먼저 질문 문장의 패턴 린 질의 정보 유형을 파악하여 질의 카테고리 및 카테고리별 개념 리스트를 구축한다. 그런 후 구축된 질의 개념 카테고리 및 리스트를 활용하여 질의 유형을 학습하고, 새로운 질의가 입력되면 해당 개념 카테고리로 분류한 후, 개념 리스트를 기반으로 개념별 질의 확장을 수행한다. 제안된 시스템의 성능 명가를 위하여, TREC-9의 질의와 TREC 문서 중 1991년도 WSJ(Wall Street Journal) 42,654건을 대상으로 실험한 결과 질의 확장을 수행하지 않는 시스템의 경우 MRR(Mean reciprocal ratio) 측정에서 0.223의 결과를 보인 반면 제안된 시스템의 경우 0.50의 향상된 결과를 보였다.
PDF

A Study on the Dense Vector Representation of Query-Passage for Open Domain Question Answering (오픈 도메인 질의응답을 위한 질문-구절의 밀집 벡터 표현 연구)

Minji Jung;Saebyeok Lee;Youngjune Kim;Cheolhun Heo;Chunghee Lee
- Annual Conference on Human and Language Technology
- /
- 2022.10a
- /
- pp.115-121
- /
- 2022
질문에 답하기 위해 관련 구절을 검색하는 기술은 오픈 도메인 질의응답의 검색 단계를 위해 필요하다. 전통적인 방법은 정보 검색 기법인 빈도-역문서 빈도(TF-IDF) 기반으로 희소한 벡터 표현을 활용하여 구절을 검색한다. 하지만 희소 벡터 표현은 벡터 길이가 길 뿐만 아니라, 질문에 나오지 않는 단어나 토큰을 검색하지 못한다는 취약점을 가진다. 밀집 벡터 표현 연구는 이러한 취약점을 개선하고 있으며 대부분의 연구가 영어 데이터셋을 학습한 것이다. 따라서, 본 연구는 한국어 데이터셋을 학습한 밀집 벡터 표현을 연구하고 여러 가지 부정 샘플(negative sample) 추출 방법을 도입하여 전이 학습한 모델 성능을 비교 분석한다. 또한, 대화 응답 선택 태스크에서 밀집 검색에 활용한 순위 재지정 상호작용 레이어를 추가한 실험을 진행하고 비교 분석한다. 밀집 벡터 표현 모델을 학습하는 것이 도전적인 과제인만큼 향후에도 다양한 시도가 필요할 것으로 보인다.
PDF

A Kinematic Approach to Answering Similarity Queries on Complex Human Motion Data (운동학적 접근 방법을 사용한 복잡한 인간 동작 질의 시스템)

Han, Hyuck;Kim, Shin-Gyu;Jung, Hyung-Soo;Yeom, Heon-Y.
- Journal of Internet Computing and Services
- /
- v.10 no.4
- /
- pp.1-11
- /
- 2009
Recently there has arisen concern in both the database community and the graphics society about data retrieval from large motion databases because the high dimensionality of motion data implies high costs. In this circumstance, finding an effective distance measure and an efficient query processing method for such data is a challenging problem. This paper presents an elaborate motion query processing system, SMoFinder (Similar Motion Finder), which incorporates a novel kinematic distance measure and an efficient indexing strategy via adaptive frame segmentation. To this end, we regard human motions as multi-linkage kinematics and propose the weighted Minkowski distance metric. For efficient indexing, we devise a new adaptive segmentation method that chooses representative frames among similar frames and stores chosen frames instead of all frames. For efficient search, we propose a new search method that processes k-nearest neighbors queries over only representative frames. Our experimental results show that the size of motion databases is reduced greatly (${\times}1/25$) but the search capability of SMoFinder is equal to or superior to that of other systems.
PDF

Development of a Regulatory Q&A System for KAERI Utilizing Document Search Algorithms and Large Language Model (거대언어모델과 문서검색 알고리즘을 활용한 한국원자력연구원 규정 질의응답 시스템 개발)

Hongbi Kim;Yonggyun Yu
- Journal of Korea Society of Industrial Information Systems
- /
- v.28 no.5
- /
- pp.31-39
- /
- 2023
The evolution of Natural Language Processing (NLP) and the rise of large language models (LLM) like ChatGPT have paved the way for specialized question-answering (QA) systems tailored to specific domains. This study outlines a system harnessing the power of LLM in conjunction with document search algorithms to interpret and address user inquiries using documents from the Korea Atomic Energy Research Institute (KAERI). Initially, the system refines multiple documents for optimized search and analysis, breaking the content into managable paragraphs suitable for the language model's processing. Each paragraph's content is converted into a vector via an embedding model and archived in a database. Upon receiving a user query, the system matches the extracted vectors from the question with the stored vectors, pinpointing the most pertinent content. The chosen paragraphs, combined with the user's query, are then processed by the language generation model to formulate a response. Tests encompassing a spectrum of questions verified the system's proficiency in discerning question intent, understanding diverse documents, and delivering rapid and precise answers.
https://doi.org/10.9723/jksiis.2023.28.5.031 인용 PDF

Search Result 55, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)