• Title/Summary/Keyword: Random query

Search Result 51, Processing Time 0.021 seconds

Efficient Dynamic Index Structure for SSD (SPM) (SSD에 적합한 동적 색인 저장 구조 : SPM)

  • Jin, Du-Seok;Kim, Jin-Suk;You, Beom-Jong;Jung, Hoe-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.2
    • /
    • pp.54-62
    • /
    • 2010
  • Inverted index structures have become the most efficient data structure for high performance indexing of large text collections, especially online index maintenance, In-Place and merge-based index structures are the two main competing strategies for index construction in dynamic search environments. In the above-mentioned two strategies, a contiguity of posting information is the mainstay of design for online index maintenance and query time. Whereas with the emergence of new storage device(SSD, SCRAM), those do not consider a contiguity of posting information in the design of index structures because of its superiority such as low access latency and I/O throughput speeds. However, SSD(Solid State Drive) is not well suited for traditional inverted structures due to the poor random write throughput in practical systems. In this paper, we propose the new efficient online index structure(SPM) for SSD that significantly reduces the query time and improves the index maintenance performance.

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.254-262
    • /
    • 2005
  • This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.

An Adaptive Peer-to-Peer Search Algorithm for Reformed Node Distribution Rate (개선된 노드 분산율을 위한 적응적 P2P 검색 알고리즘)

  • Kim, Boon-Hee;Lee, Jun-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.93-102
    • /
    • 2005
  • Excessive traffic of P2P applications in the limited communication environment is considered as a network bandwidth problem. Moreover, Though P2P systems search a resource in the phase of search using weakly connected systems(peers' connection to P2P overlay network is very weakly connected), it is not guaranteed to download the very peer's resource in the phase of download. In previous P2P search algorithm (1), we had adopted the heuristic peer selection method based on Random Walks to resolve this problems. In this paper, we suggested an adaptive P2P search algorithm based on the previous algorithm(1) to reform the node distribution rate which is affected in unit peer ability. Also, we have adapted the discriminative replication method based on a query ratio to reduce traffic amount additionally. In the performance estimation result of this suggested system, our system works on a appropriate point of compromise in due consideration of the direction of searching and distribution of traffic occurrence.

  • PDF

A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness

  • Kim, Sang-Min;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.33-40
    • /
    • 2021
  • Due to advanced complex strategies, the complexity of information that a commander must analyze is increasing. An intelligent service that can analyze battlefield is needed for the commander's timely judgment. This service consists of extracting knowledge from battlefield information, building a knowledge base, and analyzing the battlefield information from the knowledge base. This paper extract information similar to an input query by embedding the knowledge base built in the 2nd step. The transformation model is needed to generate the embedded knowledge base and uses the random-walk algorithm. The transformed information is embedding using Word2Vec, and Similar information is extracted through cosine similarity. In this paper, 980 sentences are generated from the open knowledge base and embedded as a 100-dimensional vector and it was confirmed that similar entities were extracted through cosine similarity.

Bagged Auto-Associative Kernel Regression-Based Fault Detection and Identification Approach for Steam Boilers in Thermal Power Plants

  • Yu, Jungwon;Jang, Jaeyel;Yoo, Jaeyeong;Park, June Ho;Kim, Sungshin
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.4
    • /
    • pp.1406-1416
    • /
    • 2017
  • In complex and large-scale industries, properly designed fault detection and identification (FDI) systems considerably improve safety, reliability and availability of target processes. In thermal power plants (TPPs), generating units operate under very dangerous conditions; system failures can cause severe loss of life and property. In this paper, we propose a bagged auto-associative kernel regression (AAKR)-based FDI approach for steam boilers in TPPs. AAKR estimates new query vectors by online local modeling, and is suitable for TPPs operating under various load levels. By combining the bagging method, more stable and reliable estimations can be achieved, since the effects of random fluctuations decrease because of ensemble averaging. To validate performance, the proposed method and comparison methods (i.e., a clustering-based method and principal component analysis) are applied to failure data due to water wall tube leakage gathered from a 250 MW coal-fired TPP. Experimental results show that the proposed method fulfills reasonable false alarm rates and, at the same time, achieves better fault detection performance than the comparison methods. After performing fault detection, contribution analysis is carried out to identify fault variables; this helps operators to confirm the types of faults and efficiently take preventive actions.

XML Repository Model based on the Edge-Labeled Graph (Edge-Labeled Graph를 적용한 XML 저장 모델)

  • 김정희;곽호영
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.5
    • /
    • pp.993-1001
    • /
    • 2003
  • A RDB Storage Model based on the Edge-Labeled Graph is suggested for store the XML instance in Relational Databases(RDB). The XML instance being stored is represented by Data Graph based on the Edge-Labeled Graph. Data Path Table, Element, Attribute, and Table Index Table values are extracted. Then Database Schema is defined, and the extracted values are stored using the Mapper. In order to support querry, Repository Model offers the translator translating XQL which is used as query language under XPATH, into SQL. In addition, it creates DBtoXML generator restoring the stored XML instance. As a result, storage relationship between the XML instance and proposed model structure can be expressed in terms of Graph-based Path, and it shows the possibility of easy search of random Element and Attribute information.

Performance Evaluation of SSD-Index Maintenance Schemes in IR Applications

  • Jin, Du-Seok;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.4
    • /
    • pp.377-382
    • /
    • 2010
  • With the advent of flash memory based new storage device (SSD), there is considerable interest within the computer industry in using flash memory based storage devices for many different types of application. The dynamic index structure of large text collections has been a primary issue in the Information Retrieval Applications among them. Previous studies have proven the three approaches to be effective: In- Place, merge-based index structure and a combination of both. The above-mentioned strategies have been researched with the traditional storage device (HDD) which has a constraint on how keep the contiguity of dynamic data. However, in case of the new storage device, we don' have any constraint contiguity problems due to its low access latency time. But, although the new storage device has superiority such as low access latency and improved I/O throughput speeds, it is still not well suited for traditional dynamic index structures because of the poor random write throughput in practical systems. Therefore, using the experimental performance evaluation of various index maintenance schemes on the new storage device, we propose an efficient index structure for new storage device that improves significantly the index maintenance speed without degradation of query performance.

Contents Analysis and Synthesis Scheme for Music Album Cover Art

  • Moon, Dae-Jin;Rho, Seung-Min;Hwang, Een-Jun
    • Journal of IKEEE
    • /
    • v.14 no.4
    • /
    • pp.305-311
    • /
    • 2010
  • Most recent web search engines perform effective keyword-based multimedia contents retrieval by investigating keywords associated with multimedia contents on the Web and comparing them with query keywords. On the other hand, most music and compilation albums provide professional artwork as cover art that will be displayed when the music is played. If the cover art is not available, then the music player just displays some dummy or random images, but this has been a source of dissatisfaction. In this paper, in order to automatically create cover art that is matched with music contents, we propose a music album cover art creation scheme based on music contents analysis and result synthesis. We first (i) analyze music contents and their lyrics and extract representative keywords, (ii) expand the keywords using WordNet and generate various queries, (iii) retrieve related images from the Web using those queries, and finally (iv) synthesize them according to the user preference for album cover art. To show the effectiveness of our scheme, we developed a prototype system and reported some results.

Coding-based Storage Design for Continuous Data Collection in Wireless Sensor Networks

  • Zhan, Cheng;Xiao, Fuyuan
    • Journal of Communications and Networks
    • /
    • v.18 no.3
    • /
    • pp.493-501
    • /
    • 2016
  • In-network storage is an effective technique for avoiding network congestion and reducing power consumption in continuous data collection in wireless sensor networks. In recent years, network coding based storage design has been proposed as a means to achieving ubiquitous access that permits any query to be satisfied by a few random (nearby) storage nodes. To maintain data consistency in continuous data collection applications, the readings of a sensor over time must be sent to the same set of storage nodes. In this paper, we present an efficient approach to updating data at storage nodes to maintain data consistency at the storage nodes without decoding out the old data and re-encoding with new data. We studied a transmission strategy that identifies a set of storage nodes for each source sensor that minimizes the transmission cost and achieves ubiquitous access by transmitting sparsely using the sparse matrix theory. We demonstrate that the problem of minimizing the cost of transmission with coding is NP-hard. We present an approximation algorithm based on regarding every storage node with memory size B as B tiny nodes that can store only one packet. We analyzed the approximation ratio of the proposed approximation solution, and compared the performance of the proposed coding approach with other coding schemes presented in the literature. The simulation results confirm that significant performance improvement can be achieved with the proposed transmission strategy.

A Fragmentation and Search Method of Query Document for Partially Plagiarized Section Detection (부분표절구간 검출을 위한 질의문서의 분할 및 탐색 기법)

  • Ock, Chang-Seok;Seo, Jong-Kyu;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.11a
    • /
    • pp.586-589
    • /
    • 2012
  • 표절과 관련된 이슈가 주목받고 있는 상황에서 표절을 검출하는 방법에 대한 연구가 활발히 진행되고 있다. 일반적으로 표절구간 검출을 위해 복잡한 자연어처리와 같은 의미론적 접근방법이 아닌 비교적 단순한 어휘기반의 문자열 처리 방법을 사용한다. 대표적인 방법으로는 지문법 (Fingerprinting)과 서열정렬 (Sequence alignment) 등이 있다. 하지만 이 방법들을 이용하여 대용량 문서에 대한 표절검사를 수행하기에는 시공간적 복잡도의 문제가 발생한다. 본 논문에서는 이러한 단점을 극복하기 위해 NGS (Next Generation Sequencing)에서 사용하는 BWT (Burrows-Wheeler Transform)[1]를 이용한 탐색방법을 응용한다. 또한 부분표절구간을 검출하고 정확도를 향상시키기 위해 질의문서를 분할하여 작은 조각으로 만든 뒤, 조각들에 대한 질의탐색을 수행한다. 본 논문에서는 질의문서를 분할하는 두 가지 방법을 소개한다. 두 가지 방법은 k-mer analysis를 이용한 방법과 random-split analysis를 이용한 방법으로, 각 방법의 장단점을 실험을 통해 분석하고 실제 부분표절구간의 검출 정확도를 측정하였다.