Search | Korea Science

Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases (대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색)

Lee, Sanghun;Kim, Bum-Soo;Choi, Mi-Jung;Moon, Yang-Sae
- Journal of KIISE
- /
- v.42 no.2
- /
- pp.242-254
- /
- 2015
In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.
https://doi.org/10.5626/JOK.2015.42.2.242 인용 KSCI

Efficient Processing of MAX-of-SUM Queries in OLAP (OLAP에서 MAX-of-SUM 질의의 효율적인 처리 기법)

Cheong, Hee-Jeong;Kim, Dong-Wook;Kim, Jong-Soo;Lee, Yoon-Joon;Kim, Myoung-Ho
- Journal of KIISE:Databases
- /
- v.27 no.2
- /
- pp.165-174
- /
- 2000
Recent researches about range queries in OLAP are only concerned with applying an aggregation operator over a certain region. However, data analysts in real world need not only the simple range query pattern but also an extended range query pattern that finds ranges which satisfy a special condition specified by using several aggregation operators. In this work, we define the general form of the extended range query and propose an efficient processing method for the 'MAX -of-SUM' query, which is the representative form of the extended range query pattern. The MAX-of-SUM query finds the range which has the maximum range sum value in data cube where the size of the range is given. The proposed query processing method is based on the prediction of the scope of the range sum values. That is, the search space on the query processing can be reduced by using the result of the prediction, and hence, the query response time is also reduced.
PDF

EP2 Labeling Scheme for XML Data (XML 데이타를 위한 EP2 레이블링 스킴)

진주용;배진욱;이석호
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10b
- /
- pp.79-81
- /
- 2004
범위 기반 레이블링 스킴(range-based labeling scheme)을 이용하면 임의의 두 노드에 대한 조상-자손 관계를 쉽게 판별할 수 있으므로, XPath나 XQuery 형태의 질의를 효율적으로 처리할 수 있다. 그러나 노드의 삽입이 일어나는 동적인 상황에서는 불가피하게 전체 또는 일부의 레이블을 다시 할당(re-labeling)할 가능성이 있다는 문제점이 있다. 본 논문에서는 Dietz 레이블링 스킴을 개선한 EP2(extended preorder ＆amp; postorder) 레이블링 스킴을 제안한다. 제안하는 스킴은 동일한 저장 공간상에서 범위 기반 레이블링 스킴에 비해 동적인 갱신에 유리하며, 기존의 구조 조인 알고리즘(structural join algorithm)을 이용하여 효율적으로 구조 질의(structural query)를 처리할 수 있다.
PDF

A Region Splitting Strategy for Spatial Access Structures Using Transformation Techniques (변환기법을 이용한 공간 액세스 구조의 영역분할 전략)

Yoon, Dong-Ha;Lee, Jong-Hak
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.04a
- /
- pp.109-112
- /
- 2002
물리적 데이터베이스 설계기법은 최적의 질의처리 성능을 제공하기 위하여 데이터베이스의 액세스 구조를 결정하는 과정이다. 본 논문에서는 변환기법을 이용한 공간 액세스 구조의 물리적 데이터베이스의 설계를 위한 영역분할 전략을 제시한다. 변환기법을 이용한 공간 액세스 구조는 원공간(original space)에서의 공간 객체들을 공간의 차원을 두 배로 하는 변환공간(transformation space)내의 점 객체들로 변환하여 관리하는 방법이다. 먼저, 원공간에 주어지는 모든 공간 질의가 변환공간에서는 한가지 형태의 범위 질의로 변환되는 특징이 있음을 보인다. 그리고, 변환공간상에서 이 범의 질의가 위치하는 질의 영역의 모양과 데이터 페이지가 위치하는 페이지 영역의 모양 사이의 관련성을 이용하여 질의처리의 성능을 향상시킬 수 있는 영역분할 전략을 제안한다. 성능평가의 결과에 의하면, 주어진 질의 패턴에 따라 최적의 공간 액세스 구조를 구성할 수 있었으며, 이차원 원공간에 대한 사차원 변환 공간인 경우에 질의의 형태에 따라 질의처리의 성능이 다섯배 이상까지 향상되었다.
PDF

Topic based Question-Answering System using Real-Time Search Terms (실시간 검색어를 이용한 주제어 기반의 질의응답시스템)

Song, Il-Hyeon;Kang, Sang-Woo;Seo, Jung-Yun
- Annual Conference on Human and Language Technology
- /
- 2011.10a
- /
- pp.33-37
- /
- 2011
본 논문에서는 실시간 검색어를 이용한 주제어 기반의 질의응답 시스템을 제안한다. 제안 시스템은 주제어로 사용자의 질의 범위를 제한함으로써 질의과정에서 발생할 수 있는 오류의 감소를 기대할 수 있다. 제안 시스템은 주제어 기반의 질의응답을 수행하기 위해 검색대상문서 색인, 질의유형결정, 검색결과의 순위화 과정을 거친다. 제안한 방법으로 기준시스템에 비해 P@5에서 질의유형별 평균 69%의 성능향상을 얻었다.
PDF

An Association Rules Mining System based-on SQL (SQL을 이용한 연관 규칙 탐사 시스템)

전수정;김영지;우용태
- Proceedings of the Korea Database Society Conference
- /
- 2000.11a
- /
- pp.89-94
- /
- 2000
본 논문에서는 연관 규칙 탐사 시스템을 설계하고 구현하였다. 본 시스템은 관계형 데이터베이스의 표준 질의어를 이용하여 사용자가 제시한 질의 조건을 만족하는 항목집합에 대해 다양한 형태의 연관규칙을 탐사하기 위한 시스템이다. 질의처리 모듈에서는 사용자가 제시한 조건을 만족하는 질의를 동적으로 구성하여, 연관 규칙 탐사를 위해 사용되는 대상 트랜잭션 데이타베이스의 범위를 조절할 수 있다. 연관 규칙을 발견하기 위한 후보 항목집합을 생성하기 위해 연관 규칙 탐사 알고리즘을 사용하였다. 연관 규칙 알고리즘에서는 한 트랜잭션 데이타에 대해 생성될 수 있는 후보 항목집합을 배열을 이용하여 처리하는 효율적인 방법을 제안하였다.
PDF

BITMAP INDEX and Searching Strategies On MMDB Adapt To Indoor Environment (MMDB에서의 실내 환경에 적합한 BITMAP INDEX와 탐색기법)

Jeon Hyeon-Sig;Park Hyun-Ju
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.11a
- /
- pp.39-42
- /
- 2004
공간 질의 및 색인에 관한 기존 연구는 주로 실외 환경에 기반을 두고 있다. 실내 환경은 실외 환경과는 달리 질의 특성 및 환경적 요소가 다르다. 실내 환경 질의의 대표적인 특징은 객체의 현재 위치를 파악하고 즉시 응답해야하며, 질의 범위도 지역적으로 제한되어 있는 점이다. 본 논문에서는 기존 연구가 가진 문제점을 해결하기 위해 메인 메모리 기반의 DBMS를 사용하며, 실내 환경에서 객체의 위치 탐색시 효율적으로 적응할 수 있는 비트맵 인덱스 기법을 제안한다.
PDF

Range Query Processing of Distributed Moving Object Databases using Scheduling Technique (스케쥴링 기법을 이용한 분산 이동 객체 데이타베이스의 범위 질의 처리)

Jeon, Se-Gil;Hwang, Jae-Il;Nah, Youn-Mook
- Journal of Korea Spatial Information System Society
- /
- v.6 no.2 s.12
- /
- pp.51-62
- /
- 2004
Recently, the location-based service for moving customers is becoming one of the most important service in mobile communication area. For moving object applications, there are lots of update operations and such update loads are concentrated on some particular area unevenly. The primary processing of LBS application is spatio-temporal range queries. To improve the throughput of spatio-temporal range queries, the time of disk I/O in query processing should be reduced. In this paper, we adopt non-uniform two-level grid index structures of GALIS architecture,which are designed to minimize update operations. We propose query scheduling technique using spatial relationship and time relationship and a combined spatio-temporal query processing method using time zone concepts to improve the throughput of query processing. Some experimental results are shown for range queries with different query range to show the performance tradeoffs of the proposed methods.
PDF

Reading Comprehension requiring Discrete Reasoning Over Paragraphs for Korean (단락에 대한 이산 추론을 요구하는 한국어 기계 독해)

Kim, Gyeong-min;Seo, Jaehyung;Lee, Soomin;Lim, Heui-seok
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.439-443
- /
- 2021
기계 독해는 단락과 질의가 주어졌을 때 단락 내 정답을 찾는 자연어 처리 태스크이다. 최근 벤치마킹 데이터셋에서 사전학습 언어모델을 기반으로 빠른 발전을 보이며 특정 데이터셋에서 인간의 성능을 뛰어넘는 성과를 거두고 있다. 그러나 이는 단락 내 범위(span)에서 추출된 정보에 관한 것으로, 실제 연산을 요구하는 질의에 대한 응답에는 한계가 있다. 본 논문에서는 기존 범위 내에서 응답이 가능할 뿐만이 아니라, 연산에 관한 이산 추론을 요구하는 단락 및 질의에 대해서도 응답이 가능한 기계 독해 모델의 효과성을 검증하고자 한다. 이를 위해 영어 DROP (Discrete Reasoning Over the content of Paragraphs, DROP) 데이터셋으로부터 1,794개의 질의응답 쌍을 Google Translator API v2를 사용하여 한국어로 번역 및 정제하여 KoDROP (Korean DROP, KoDROP) 데이터셋을 구축하였다. 단락 및 질의를 참조하여 연산을 수행하기 위한 의미 태그를 한국어 KoBERT 및 KoELECTRA에 접목하여, 숫자 인식이 가능한 KoNABERT, KoNAELECTRA 모델을 생성하였다. 실험 결과, KoDROP 데이터셋은 기존 기계 독해 데이터셋과 비교하여 단락에 대한 더욱 포괄적인 이해와 연산 정보를 요구하였으며, 가장 높은 성능을 기록한 KoNAELECTRA는 KoBERT과 비교하여 F1, EM에서 모두 19.20의 월등한 성능 향상을 보였다.
PDF

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

Cha Guang-Ho
- Journal of KIISE:Databases
- /
- v.32 no.3
- /
- pp.254-262
- /
- 2005
This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.
PDF KSCI

Search Result 494, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)