• Title/Summary/Keyword: 근사 질의 처리

Search Result 58, Processing Time 0.033 seconds

Indexing and Searching for Reduced-Dimensional Vectors (차원 축소 벡터들을 위한 인덱싱 및 검색)

  • Jeong, Seung-Do;Kim, Sang-Wook;Choi, Byung-Uk
    • Journal of KIISE:Databases
    • /
    • v.37 no.1
    • /
    • pp.44-49
    • /
    • 2010
  • In this paper, we first address the problems associated with indexing and searching for reduced-dimensional vectors, which are reduced by using a combination of angle approximation and dimension grouping. Then, we propose a novel method to solve the problems. We also show the superiority of the proposed method by performing extensive experiments with synthetic and real-life data sets.

Max Error Histogram Construction for Interval Data (구간 데이타에 대한 최대 에러 히스토그램 구축)

  • Lee, Ho-Seok;Shim, Kyu-Seok;Yi, Byoung-Kee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.33-38
    • /
    • 2006
  • 히스토그램은 원본 데이타를 효과적으로 요약하는 기법중의 하나 이며, 선택도 측정과 근사 질의 처리 등에 널리 사용되고 있다. 기존의 히스토그램 구축 알고리즘들은 하나의 값으로 표현되는 점 데이타에 대하여 적용 가능한 알고리즘 이었다. 그러나 일상생활에서는 하루 동안의 온도, 주식 가격과 같은 구간 데이타들도 점 데이타만큼 흔하게 접할 수 있다. 본 논문에서는 기존의 Max 에러에 대한 히스토그램 구축 알고리을 구간 데이터에 대하여 확장한다. 합성 데이타를 사용한 실험을 통하여 기존의 점 데이타에 대한 히스토그램을 초보적으로 확장하는 방법보다 본 논문에서 제시된 알고리즘의 성능이 좋다는 것을 보였다.

  • PDF

An Improved Split Algorithm for Indexing of Moving Object Trajectories (이동 객체 궤적의 색인을 위한 개선된 분할 알고리즘)

  • Jeon, Hyun-Jun;Park, Ju-Hyun;Park, Hee-Suk;Cho, Woo-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.16D no.2
    • /
    • pp.161-168
    • /
    • 2009
  • Recently, use of various position base servicesthat collect position information for moving object and utilize in real life is increasing by the development of wireless network technology. Accordingly, new index structures are required to efficiently retrieve the consecutive positions of moving objects. This paper addresses an improved trajectory split algorithm for the purpose of efficiently supporting spatio-temporal range queries using index structures that use Minimum Bounding Rectangles(MBR) as trajectory approximations. We consider volume of Extended Minimum Bounding Rectangles (EMBR) to be determined by average size of range queries. Also, Use a priority queue to speed up our process. This algorithm gives in general sub-optimal solutions with respect to search space. Our improved trajectory split algorithm is going to derive minimizing volume of EMBRs better than previously proposed split algorithm.

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

  • Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.263-275
    • /
    • 2005
  • In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.

Efficient Indexing structure for Moving Object Trajectoriest (이동객체궤적에 대한 효율적인 색인구조)

  • Kim, Gyu-Jae;Cho, Woo-hyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.360-363
    • /
    • 2015
  • In n-dimensional spatial data, Minimum Boundary Rectangle(MBR) was used to handle the moving object trajectories data. But, this method has inaccurate approximation. So, It makes many dead space and performs unnecessary operation when processing a query. In this paper, we offer new index structure using approximation. We developed algorithm that make index strucutre by using Douglas-Peucker Algorithm and had a comparison experiment.

  • PDF

Extensions of Histogram Construction Algorithms for Interval Data (구간 데이타에 대한 히스토그램 구축 알고리즘의 확장)

  • Lee, Ho-Seok;Shim, Kyu-Seok;Yi, Byoung-Kee
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.369-377
    • /
    • 2007
  • Histogram is one of tools that efficiently summarize data, and it is widely used for selectivity estimation and approximate query answering. Existing histogram construction algorithms are applicable to point data represented by a set of values. As often as point data, we can meet interval data such as daily temperature and daily stock prices. In this paper, we thus propose the histogram construction algorithms for interval data by extending several methods used in existing histogram construction algorithms. Our experiment results, using synthetic data, show our algorithms outperform naive extension of existing algorithms.

Odysseus/Parallel-OOSQL: A Parallel Search Engine using the Odysseus DBMS Tightly-Coupled with IR Capability (오디세우스/Parallel-OOSQL: 오디세우스 정보검색용 밀결합 DBMS를 사용한 병렬 정보 검색 엔진)

  • Ryu, Jae-Joon;Whang, Kyu-Young;Lee, Jae-Gil;Kwon, Hyuk-Yoon;Kim, Yi-Reun;Heo, Jun-Suk;Lee, Ki-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.4
    • /
    • pp.412-429
    • /
    • 2008
  • As the amount of electronic documents increases rapidly with the growth of the Internet, a parallel search engine capable of handling a large number of documents are becoming ever important. To implement a parallel search engine, we need to partition the inverted index and search through the partitioned index in parallel. There are two methods of partitioning the inverted index: 1) document-identifier based partitioning and 2) keyword-identifier based partitioning. However, each method alone has the following drawbacks. The former is convenient in inserting documents and has high throughput, but has poor performance for top h query processing. The latter has good performance for top-k query processing, but is inconvenient in inserting documents and has low throughput. In this paper, we propose a hybrid partitioning method to compensate for the drawback of each method. We design and implement a parallel search engine that supports the hybrid partitioning method using the Odysseus DBMS tightly coupled with information retrieval capability. We first introduce the architecture of the parallel search engine-Odysseus/parallel-OOSQL. We then show the effectiveness of the proposed system through systematic experiments. The experimental results show that the query processing time of the document-identifier based partitioning method is approximately inversely proportional to the number of blocks in the partition of the inverted index. The results also show that the keyword-identifier based partitioning method has good performance in top-k query processing. The proposed parallel search engine can be optimized for performance by customizing the methods of partitioning the inverted index according to the application environment. The Odysseus/parallel OOSQL parallel search engine is capable of indexing, storing, and querying 100 million web documents per node or tens of billions of web documents for the entire system.

Improving Fatigue Strength of Weld Joints by Blast Cleaning used in Painting Steel Bridge (강교 도장용 블라스트 표면처리에 의한 용접이음의 피로강도 향상)

  • Kim, In-Tae;Jung, Young-Soo;Lee, Dong-Uk
    • Journal of Korean Society of Steel Construction
    • /
    • v.23 no.2
    • /
    • pp.137-146
    • /
    • 2011
  • In the fabrication of steel bridges, blast cleaning prior to painting is carried out on the steel members to clean the forged surface and to increase the adhesive property of the applied painting systems. The effect of blast cleaning on the fatigue strength improvement of the weld joints, however, is not clear. In this study, Almen strips and steel specimens were blast-treated, conforming to ten types of blast-cleaning conditions deducted from the blast-cleaning conditions of seven steel structure fabrication companies. The arc height, roughness, hardness, and compressive residual stress were measured before and after the implementation of the ten blast-cleaning methods, and the relationship between the blast conditions and the measured values was studied. The geometry of the weld toe and the compressive residual stress near the weld toe were also measured before and after the blast cleaning of the butt-welded joints, and fatigue tests were carried out on the butt weld joints. The test results showed that blast cleaning significantly increases the fatigue strength and limit.

Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases (대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색)

  • Lee, Sanghun;Kim, Bum-Soo;Choi, Mi-Jung;Moon, Yang-Sae
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.242-254
    • /
    • 2015
  • In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity (문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상)

  • Park Eui-Kyu;Ra Dong-Yul;Jang Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.406-415
    • /
    • 2005
  • Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.