• Title/Summary/Keyword: Analysis of Query

Search Result 457, Processing Time 0.03 seconds

The Study of DBaaS Hub System for Integration of Database In the Cloud Environment (클라우드 환경에서 데이터베이스 통합을 위한 DBaaS 허브 시스템에 관한 연구)

  • Jung, Kye-Dong;Hwang, Chi-Gon;Lee, Jong-Yong;Shin, Hyo-Young
    • Journal of Digital Convergence
    • /
    • v.12 no.9
    • /
    • pp.201-207
    • /
    • 2014
  • In the cloud environment, the company needs data integration and analysis to make decision and policy. If new system is added to this environment, a lot of time and cost is needed due to disparate properties among systems when data is integrated. Therefore, in this paper, we propose a DBaaS hub system for multi-database service. The DBaaS may require a different database and need data integration for relevant service. Using the ontology, we propose a metadata query to resolve the interoperability issues between data of DBaaS. The meta-query is not a query to access the real data, but the query for the upper level. This method provides data integration by accessing the data with the converted query through an ontology when we access the actual database.We also constructs a document-oriented database system using the metadata.

Metadata Management Method for Consistency and Recency in Digital Library (디지탈 도서관 환경에서 일관성과 최근성을 고려한 메타데이타 관리 방법)

  • Lee, Hai-Min;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.27 no.1
    • /
    • pp.22-32
    • /
    • 2000
  • The Digital Library is the integrated system of Information Retrieval System(IRS) and Database Management system(DBMS). In the Digital Library environment where dynamic query and update processes are required, however, the existing transaction management methods cause the following problems. First, since the traditional consistency criteria is too restrictive, it causes increment of query processing time and cannot guarantee the reflection of recency. Second, query result could be unreliable because the consistency criteria between source data and metadata is not defined. This paper models the access to metadata based on Dublin Core as query transactions and update transactions, and gives the efficient method to manage them. Particularly, this paper describes the consistency criteria of metadata which takes it Into consideration the consistency between the result of query transaction and status of source data in the Digital Library, that is different from the consistency criteria in traditional transaction management. It also presents analysis of the view point of query transaction to reflect recency and proposes metadata management to guarantee recency within metadata consistency.

  • PDF

Query Extension of Retrieve System Using Hangul Word Embedding and Apriori (한글 워드임베딩과 아프리오리를 이용한 검색 시스템의 질의어 확장)

  • Shin, Dong-Ha;Kim, Chang-Bok
    • Journal of Advanced Navigation Technology
    • /
    • v.20 no.6
    • /
    • pp.617-624
    • /
    • 2016
  • The hangul word embedding should be performed certainly process for noun extraction. Otherwise, it should be trained words that are not necessary, and it can not be derived efficient embedding results. In this paper, we propose model that can retrieve more efficiently by query language expansion using hangul word embedded, apriori, and text mining. The word embedding and apriori is a step expanding query language by extracting association words according to meaning and context for query language. The hangul text mining is a step of extracting similar answer and responding to the user using noun extraction, TF-IDF, and cosine similarity. The proposed model can improve accuracy of answer by learning the answer of specific domain and expanding high correlation query language. As future research, it needs to extract more correlation query language by analysis of user queries stored in database.

EMQT : A Study on Enhanced M-ary Query Tree Algorithm for Sequential Tag IDs (연속적인 태그 ID들을 위한 M-ary 쿼리 트리 알고리즘의 향상에 관한 연구)

  • Yang, Dongmin;Shin, Jongmin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.6
    • /
    • pp.435-445
    • /
    • 2013
  • One of the most challenging issues in radio frequency identification (RFID) and near field communications (NFC) is to correctly and quickly recognize a number of tag IDs in the reader's field. Unlike the probabilistic anti-collision schemes, a query tree based protocol guarantees to identify all the tags, where the distribution of tag IDs is assumed to be uniform. However, in real implements, the prefix of tag ID is uniquely assigned by the EPCglobal and the remaining part is sequentially given by a company or manufacturer. In this paper, we propose an enhanced M-ary query tree protocol (EMQT), which effectively reduces unnecessary query-response cycles between similar tag IDs using m-bit arbitration and tag expectation. The theoretical analysis and simulation results show that the EMQT significantly outperforms other schemes in terms of identification time, identification efficiency and communications overhead.

Efficient Processing of Multiple Group-by Queries in MapReduce for Big Data Analysis (맵리듀스에서 빅데이터 분석을 위한 다중 Group-by 질의의 효율적인 처리 기법)

  • Park, Eunju;Park, Sojeong;Oh, Sohyun;Choi, Hyejin;Lee, Ki Yong;Shim, Junho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.5
    • /
    • pp.387-392
    • /
    • 2015
  • MapReduce is a framework used to process large data sets in parallel on a large cluster. A group-by query is a query that partitions the input data into groups based on the values of the specified attributes, and then evaluates the value of the specified aggregate function for each group. In this paper, we propose an efficient method for processing multiple group-by queries using MapReduce. Instead of computing each group-by query independently, the proposed method computes multiple group-by queries in stages with one or more MapReduce jobs in order to reduce the total execution cost. We compared the performance of this method with the performance of a less sophisticated method that computes each group-by query independently. This comparison showed that the proposed method offers better performance in terms of execution time.

A Study on the Content Utilization of KISTI Science and Technology Information Service (KISTI 과학기술정보서비스의 콘텐츠 활용 분석)

  • Kang, Nam-Gyu;Hwang, Mi-Nyeong
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.87-95
    • /
    • 2020
  • The Science and Technology Information Service provided by the Korea Institute of Science and Technology Information (KISTI) is a service designed to allow users to easily and conveniently search and view content that is built similar to the general information service. NDSL is KISTI's core science, technology and information service, providing about 138 million content and having about 93 million page views in a year of 2019. In this paper, various insights were derived through the analysis of how science and technology information such as academic papers, reports and patents provided by NDSL is searched and utilized through web services (https://www.ndsl.kr) and search query words. In addition to general statistics such as the status of content construction, utilization status and utilization methods by type of content, monthly/weekly/time-of-day content usage, content view rate per one-time search by content type, the comparison of the use status of academic papers by year, the relationship between the utilization of domestic academic papers and the KCI index we analyzed the usability of each content type, such as academic papers and patents. We analyzed query words such as the language form of query words, the number of words of query words, and the relationship between query words and timeliness by content type. Based on the results of these analyses, we would like to propose ways to improve the service. We suggest that NDSL improvements include ways to dynamically reflect the results of content utilization behavior in the search results rankings, to extend query and to establish profile information through non-login user identification for targeted services.

An Efficient XML Query Processing Method using Path Containment Relationships (경로 포함 관계를 이용한 효율적인 XML 질의 처리기법)

  • 민경섭;김형주
    • Journal of KIISE:Databases
    • /
    • v.31 no.2
    • /
    • pp.183-194
    • /
    • 2004
  • As XML is a do facto standard for a data exchange language, there have been several researches on efficient processing XML queries. The most important thing to consider when processing XML queries is how efficiently we can process path expressions in queries. Some previous works make results by performing a sequence of join operations on all records corresponding to labels in the path expression. Others works check the existence of paths in the query using an RDBMS's string comparison operator and make results by extracting the records corresponding to the paths. In this paper we suggested a new query planning algorithm based on path containment relationships and two join operators supporting the planning algorithm. The join operators use only the records related to the paths in a query as input data, scan them only once, and generate result data using a pipelining mechanism. By analysis and experiments, we confirmed that our techniques(a new query planning algorithm and two join operators) achieved significantly higher performance than other previous works.

A RealTime DNS Query Analysis System based On the Web (웹 기반 실시간 DNS 질의 분석 시스템)

  • Jang, Sang-Dong
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.279-285
    • /
    • 2015
  • In this paper, we present the design and implementation of a realtime DNS Query Analysis System to detect and to protect from DNS attacks. The proposed system uses mirroring to collect data in DMZ, then analizes the collected data. As a result of the analysis, if the proposed system finds attack information, the information is used as a filtering information of firewall. statistic of the collected data is viewed as a realtime monitoring information on the web. To verify the effictiveness of the proposed system, we have built the proposed system and conducted some experiments. As the result, Our proposed system can be used effectively to defend DNS spoofing, DNS flooding attack, DNS amplification attack, can prevent interior network's attackers from attacking and provides realtime DNS query statistic information and geographic information for monitoring DNS query using GeoIP API and Google API. It can be useful information for ICT convergence and the future work.

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

Analyzing RDF Data in Linked Open Data Cloud using Formal Concept Analysis

  • Hwang, Suk-Hyung;Cho, Dong-Heon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.6
    • /
    • pp.57-68
    • /
    • 2017
  • The Linked Open Data(LOD) cloud is quickly becoming one of the largest collections of interlinked datasets and the de facto standard for publishing, sharing and connecting pieces of data on the Web. Data publishers from diverse domains publish their data using Resource Description Framework(RDF) data model and provide SPARQL endpoints to enable querying their data, which enables creating a global, distributed and interconnected dataspace on the LOD cloud. Although it is possible to extract structured data as query results by using SPARQL, users have very poor in analysis and visualization of RDF data from SPARQL query results. Therefore, to tackle this issue, based on Formal Concept Analysis, we propose a novel approach for analyzing and visualizing useful information from the LOD cloud. The RDF data analysis and visualization technique proposed in this paper can be utilized in the field of semantic web data mining by extracting and analyzing the information and knowledge inherent in LOD and supporting classification and visualization.