• Title/Summary/Keyword: Top-k query

Search Result 66, Processing Time 0.027 seconds

Minimizing the MOLAP/ROLAP Divide: You Can Have Your Performance and Scale It Too

  • Eavis, Todd;Taleb, Ahmad
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.1
    • /
    • pp.1-20
    • /
    • 2013
  • Over the past generation, data warehousing and online analytical processing (OLAP) applications have become the cornerstone of contemporary decision support environments. Typically, OLAP servers are implemented on top of either proprietary array-based storage engines (MOLAP) or as extensions to conventional relational DBMSs (ROLAP). While MOLAP systems do indeed provide impressive performance on common analytics queries, they tend to have limited scalability. Conversely, ROLAP's table oriented model scales quite nicely, but offers mediocre performance at best relative to the MOLAP systems. In this paper, we describe a storage and indexing framework that aims to provide both MOLAP like performance and ROLAP like scalability by essentially combining some of the best features from both. Based upon a combination of R-trees and bitmap indexes, the storage engine has been integrated with a robust OLAP query engine prototype that is able to fully exploit the efficiency of the proposed storage model. Specifically, it utilizes an OLAP algebra coupled with a domain specific query optimizer, to map user queries directly to the storage and indexing framework. Experimental results demonstrate that not only does the design improve upon more naive approaches, but that it does indeed offer the potential to optimize both query performance and scalability.

Approximate Top-k Labeled Subgraph Matching Scheme Based on Word Embedding (워드 임베딩 기반 근사 Top-k 레이블 서브그래프 매칭 기법)

  • Choi, Do-Jin;Oh, Young-Ho;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.33-43
    • /
    • 2022
  • Labeled graphs are used to represent entities, their relationships, and their structures in real data such as knowledge graphs and protein interactions. With the rapid development of IT and the explosive increase in data, there has been a need for a subgraph matching technology to provide information that the user is interested in. In this paper, we propose an approximate Top-k labeled subgraph matching scheme that considers the semantic similarity of labels and the difference in graph structure. The proposed scheme utilizes a learning model using FastText in order to consider the semantic similarity of a label. In addition, the label similarity graph(LSG) is used for approximate subgraph matching by calculating similarity values between labels in advance. Through the LSG, we can resolve the limitations of the existing schemes that subgraph expansion is possible only if the labels match exactly. It supports structural similarity for a query graph by performing searches up to 2-hop. Based on the similarity value, we provide k subgraph matching results. We conduct various performance evaluations in order to show the superiority of the proposed scheme.

Domain Question Answering System (도메인 질의응답 시스템)

  • Yoon, Seunghyun;Rhim, Eunhee;Kim, Deokho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.144-147
    • /
    • 2015
  • Question Answering (QA) services can provide exact answers to user questions written in natural language form. This research focuses on how to build a QA system for a specific domain area. Online and offline QA system architecture of targeted domain such as domain detection, question analysis, reasoning, information retrieval, filtering, answer extraction, re-ranking, and answer generation, as well as data preparation are presented herein. Test results with an official Frequently Asked Question (FAQ) set showed 68% accuracy of the top 1 and 77% accuracy of the top 5. The contribution of each part such as question analysis system, document search engine, knowledge graph engine and re-ranking module for achieving the final answer are also presented.

Top-k Query Processing Algorithm supporting Privacy Preservation on the Outsourced Databases (아웃소싱 데이터베이스에서 정보보호를 지원하는 Top-k 질의처리 알고리즘)

  • Kim, Hyeong-Il;Kim, Hyeong-Jin;Shin, JaeHwan;Chang, Jae-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.562-566
    • /
    • 2016
  • 클라우드 컴퓨팅의 발전과 더불어 데이터베이스 아웃소싱에 대한 연구가 활발히 진행되고 있다. 그러나 기존 정보보호를 지원하는 Topk 질의처리 연구는 다양한 형태로 정보가 노출되는 문제점을 보인다. 따라서 본 논문에서는 데이터 보호, 사용자 질의 보호, 데이터 접근 패턴 은닉을 모두 지원하는 아웃소싱 데이터베이스 상에서의 안전한 Topk 질의처리 알고리즘을 제안한다. 성능평가를 통해, 제안하는 기법이 정보보호를 지원하는 동시에 효율적인 성능을 제공함을 보인다.

A study of Search trends about herbal medicine on online portal (온라인 포털에서 한약재 검색 트렌드와 의미에 대한 고찰)

  • Lee, Seungho;Kim, Anna;Kim, Sanghyun;Kim, Sangkyun;Seo, Jinsoon;Jang, Hyunchul
    • The Korea Journal of Herbology
    • /
    • v.31 no.4
    • /
    • pp.93-100
    • /
    • 2016
  • Objectives : The internet is the most common method to investigate information. It is showed that 75.2% of Internet users of 20s had health information search experience. So this study is aim to understanding of interest of public about the herbal medicine using internet search query volume data.Methods : The Naver that is the top internet portal web service of the Republic of Korea has provided an Internet search query volume data from January 2007 to the current through the Naver data lab (http://datalab.naver.com) service. We have collected search query volume data which was provided by the Naver in 606 herbal medicine names and sorted the data by peak and total search volume.Results : The most frequently searched herbal medicines which has less bias and sorted by peak search volume is 'wasong (와송)'. And the most frequently searched herbal medicines which has less bias and sorted by total search volume is 'hasuo (하수오)'.Conclustions : This study is showed that the rank of interest of public about herbal medicines. Among the above herbal medicines, some herbal medicines had supply issue. And there are some other herbal medicines that had very little demand in Korean medicine market, but highly interested public. So it is necessary to monitor for these herbal medicines which is highly interested of the public. Furthermore if the reliability of the data obtained on the basis of these studies, it is possible to be utilizing herbal medicine monitoring service.

Personalized Web Search using Query based User Profile (질의기반 사용자 프로파일을 이용하는 개인화 웹 검색)

  • Yoon, Sung Hee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.2
    • /
    • pp.690-696
    • /
    • 2016
  • Search engines that rely on morphological matching of user query and web document content do not support individual interests. This research proposes a personalized web search scheme that returns the results that reflect the users' query intent and personal preferences. The performance of the personalized search depends on using an effective user profiling strategy to accurately capture the users' personal interests. In this study, the user profiles are the databases of topic words and customized weights based on the recent user queries and the frequency of topic words in click history. To determine the precise meaning of ambiguous queries and topic words, this strategy uses WordNet to calculate the semantic relatedness to words in the user profile. The experiments were conducted by installing a query expansion and re-ranking modules on the general web search systems. The results showed that this method has 92% precision and 82% recall in the top 10 search results, proving the enhanced performance.

A Survey on the Detection of SQL Injection Attacks and Their Countermeasures

  • Nagpal, Bharti;Chauhan, Naresh;Singh, Nanhay
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.689-702
    • /
    • 2017
  • The Structured Query Language (SQL) Injection continues to be one of greatest security risks in the world according to the Open Web Application Security Project's (OWASP) [1] Top 10 Security vulnerabilities 2013. The ease of exploitability and severe impact puts this attack at the top. As the countermeasures become more sophisticated, SOL Injection Attacks also continue to evolve, thus thwarting the attempt to eliminate this attack completely. The vulnerable data is a source of worry for government and financial institutions. In this paper, a detailed survey of different types of SQL Injection and proposed methods and theories are presented, along with various tools and their efficiency in intercepting and preventing SQL attacks.

OLAP4R: A Top-K Recommendation System for OLAP Sessions

  • Yuan, Youwei;Chen, Weixin;Han, Guangjie;Jia, Gangyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.6
    • /
    • pp.2963-2978
    • /
    • 2017
  • The Top-K query is currently played a key role in a wide range of road network, decision making and quantitative financial research. In this paper, a Top-K recommendation algorithm is proposed to solve the cold-start problem and a tag generating method is put forward to enhance the semantic understanding of the OLAP session. In addition, a recommendation system for OLAP sessions called "OLAP4R" is designed using collaborative filtering technique aiming at guiding the user to find the ultimate goals by interactive queries. OLAP4R utilizes a mixed system architecture consisting of multiple functional modules, which have a high extension capability to support additional functions. This system structure allows the user to configure multi-dimensional hierarchies and desirable measures to analyze the specific requirement and gives recommendations with forthright responses. Experimental results show that our method has raised 20% recall of the recommendations comparing the traditional collaborative filtering and a visualization tag of the recommended sessions will be provided with modified changes for the user to understand.

Thai Classical Music Matching Using t-Distribution on Instantaneous Robust Algorithm for Pitch Tracking Framework

  • Boonmatham, Pheerasut;Pongpinigpinyo, Sunee;Soonklang, Tasanawan
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1213-1228
    • /
    • 2017
  • The pitch tracking of music has been researched for several decades. Several possible improvements are available for creating a good t-distribution, using the instantaneous robust algorithm for pitch tracking framework to perfectly detect pitch. This article shows how to detect the pitch of music utilizing an improved detection method which applies a statistical method; this approach uses a pitch track, or a sequence of frequency bin numbers. This sequence is used to create an index that offers useful features for comparing similar songs. The pitch frequency spectrum is extracted using a modified instantaneous robust algorithm for pitch tracking (IRAPT) as a base combined with the statistical method. The pitch detection algorithm was implemented, and the percentage of performance matching in Thai classical music was assessed in order to test the accuracy of the algorithm. We used the longest common subsequence to compare the similarities in pitch sequence alignments in the music. The experimental results of this research show that the accuracy of retrieval of Thai classical music using the t-distribution of instantaneous robust algorithm for pitch tracking (t-IRAPT) is 99.01%, and is in the top five ranking, with the shortest query sample being five seconds long.

Multi-dimensional Traveling salesman problem using Top-n Skyline query (Top-n 스카이라인 질의를 이용한 다차원 외판원 순회문제)

  • Jin, ChangGyun;Yang, Sevin;Kang, Eunjin;Kim, JiYun;Kim, Jongwan;Oh, Dukshin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.371-374
    • /
    • 2019
  • PDA나 휴대폰 단말로 여러 속성의 데이터를 이용하여 사용자에게 필요한 정보를 제공하는 위치기반 서비스는 물류/운송 정보 서비스, 버스/지하철 노선 안내 서비스 등에 사용된다. 여기에서 제공하는 데이터들을 최적 경로를 구하는 외판원 순회문제 (Traveling Salesman Problem)에 사용한다면 더 정확한 경로 서비스 제공이 가능하다. 하지만 데이터의 수가 많아질수록 비교 횟수가 기하급수적으로 늘어나는 외판원 순회 알고리즘의 특성상 일반 단말기에서 활용하기에는 배터리의 제약이 따른다. 본 논문에서는 이와 같은 단점을 해결하기 위해서 최적 경로의 후보군을 줄일 수 있는 스카이라인 질의를 이용하여 n차원 속성에 대한 최적 경로 알고리즘을 제안한다. 실험에서 정확도와 오차율을 통해 제안한 방식의 유용성을 보였으며 기존방식과 연산시간 차이를 비교하여 다차원방식의 효율성을 나타내었다.