• Title/Summary/Keyword: Query Processing Method

Search Result 532, Processing Time 0.021 seconds

A Load Balancing Method using Partition Tuning for Pipelined Multi-way Hash Join (다중 해시 조인의 파이프라인 처리에서 분할 조율을 통한 부하 균형 유지 방법)

  • Mun, Jin-Gyu;Jin, Seong-Il;Jo, Seong-Hyeon
    • Journal of KIISE:Databases
    • /
    • v.29 no.3
    • /
    • pp.180-192
    • /
    • 2002
  • We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new harsh join methods in the shared-nothing multiprocessor environment. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets dynamically via a frequency distribution. Using harsh-based joins, multiple joins can be pipelined to that the early results from a join, before the whole join is completed, are sent to the next join processing without staying in disks. Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. However, this hardware structure is very sensitive to the data skew. Unless the pipelining execution of multiple hash joins includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this parer, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Query-based Answer Extraction using Korean Dependency Parsing (의존 구문 분석을 이용한 질의 기반 정답 추출)

  • Lee, Dokyoung;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.161-177
    • /
    • 2019
  • In this paper, we study the performance improvement of the answer extraction in Question-Answering system by using sentence dependency parsing result. The Question-Answering (QA) system consists of query analysis, which is a method of analyzing the user's query, and answer extraction, which is a method to extract appropriate answers in the document. And various studies have been conducted on two methods. In order to improve the performance of answer extraction, it is necessary to accurately reflect the grammatical information of sentences. In Korean, because word order structure is free and omission of sentence components is frequent, dependency parsing is a good way to analyze Korean syntax. Therefore, in this study, we improved the performance of the answer extraction by adding the features generated by dependency parsing analysis to the inputs of the answer extraction model (Bidirectional LSTM-CRF). The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. In this study, we compared the performance of the answer extraction model when inputting basic word features generated without the dependency parsing and the performance of the model when inputting the addition of the Eojeol tag feature and dependency graph embedding feature. Since dependency parsing is performed on a basic unit of an Eojeol, which is a component of sentences separated by a space, the tag information of the Eojeol can be obtained as a result of the dependency parsing. The Eojeol tag feature means the tag information of the Eojeol. The process of generating the dependency graph embedding consists of the steps of generating the dependency graph from the dependency parsing result and learning the embedding of the graph. From the dependency parsing result, a graph is generated from the Eojeol to the node, the dependency between the Eojeol to the edge, and the Eojeol tag to the node label. In this process, an undirected graph is generated or a directed graph is generated according to whether or not the dependency relation direction is considered. To obtain the embedding of the graph, we used Graph2Vec, which is a method of finding the embedding of the graph by the subgraphs constituting a graph. We can specify the maximum path length between nodes in the process of finding subgraphs of a graph. If the maximum path length between nodes is 1, graph embedding is generated only by direct dependency between Eojeol, and graph embedding is generated including indirect dependencies as the maximum path length between nodes becomes larger. In the experiment, the maximum path length between nodes is adjusted differently from 1 to 3 depending on whether direction of dependency is considered or not, and the performance of answer extraction is measured. Experimental results show that both Eojeol tag feature and dependency graph embedding feature improve the performance of answer extraction. In particular, considering the direction of the dependency relation and extracting the dependency graph generated with the maximum path length of 1 in the subgraph extraction process in Graph2Vec as the input of the model, the highest answer extraction performance was shown. As a result of these experiments, we concluded that it is better to take into account the direction of dependence and to consider only the direct connection rather than the indirect dependence between the words. The significance of this study is as follows. First, we improved the performance of answer extraction by adding features using dependency parsing results, taking into account the characteristics of Korean, which is free of word order structure and omission of sentence components. Second, we generated feature of dependency parsing result by learning - based graph embedding method without defining the pattern of dependency between Eojeol. Future research directions are as follows. In this study, the features generated as a result of the dependency parsing are applied only to the answer extraction model in order to grasp the meaning. However, in the future, if the performance is confirmed by applying the features to various natural language processing models such as sentiment analysis or name entity recognition, the validity of the features can be verified more accurately.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Development of AI-based Real Time Agent Advisor System on Call Center - Focused on N Bank Call Center (AI기반 콜센터 실시간 상담 도우미 시스템 개발 - N은행 콜센터 사례를 중심으로)

  • Ryu, Ki-Dong;Park, Jong-Pil;Kim, Young-min;Lee, Dong-Hoon;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.750-762
    • /
    • 2019
  • The importance of the call center as a contact point for the enterprise is growing. However, call centers have difficulty with their operating agents due to the agents' lack of knowledge and owing to frequent agent turnover due to downturns in the business, which causes deterioration in the quality of customer service. Therefore, through an N-bank call center case study, we developed a system to reduce the burden of keeping up business knowledge and to improve customer service quality. It is a "real-time agent advisor" system that provides agents with answers to customer questions in real time by combining AI technology for speech recognition, natural language processing, and questions & answers for existing call center information systems, such as a private branch exchange (PBX) and computer telephony integration (CTI). As a result of the case study, we confirmed that the speech recognition system for real-time call analysis and the corpus construction method improves the natural speech processing performance of the query response system. Especially with name entity recognition (NER), the accuracy of the corpus learning improved by 31%. Also, after applying the agent advisor system, the positive feedback rate of agents about the answers from the agent advisor was 93.1%, which proved the system is helpful to the agents.

Vector Approximation Bitmap Indexing Method for High Dimensional Multimedia Database (고차원 멀티미디어 데이터 검색을 위한 벡터 근사 비트맵 색인 방법)

  • Park Joo-Hyoun;Son Dea-On;Nang Jong-Ho;Joo Bok-Gyu
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.455-462
    • /
    • 2006
  • Recently, the filtering approach using vector approximation such as VA-file[1] or LPC-file[2] have been proposed to support similarity search in high dimensional data space. This approach filters out many irrelevant vectors by calculating the approximate distance from a query vector using the compact approximations of vectors in database. Accordingly, the total elapsed time for similarity search is reduced because the disk I/O time is eliminated by reading the compact approximations instead of original vectors. However, the search time of the VA-file or LPC-file is not much lessened compared to the brute-force search because it requires a lot of computations for calculating the approximate distance. This paper proposes a new bitmap index structure in order to minimize the calculating time. To improve the calculating speed, a specific value of an object is saved in a bit pattern that shows a spatial position of the feature vector on a data space, and the calculation for a distance between objects is performed by the XOR bit calculation that is much faster than the real vector calculation. According to the experiment, the method that this paper suggests has shortened the total searching time to the extent of about one fourth of the sequential searching time, and to the utmost two times of the existing methods by shortening the great deal of calculating time, although this method has a longer data reading time compared to the existing vector approximation based approach. Consequently, it can be confirmed that we can improve even more the searching performance by shortening the calculating time for filtering of the existing vector approximation methods when the database speed is fast enough.

Region Based Image Similarity Search using Multi-point Relevance Feedback (다중점 적합성 피드백방법을 이용한 영역기반 이미지 유사성 검색)

  • Kim, Deok-Hwan;Lee, Ju-Hong;Song, Jae-Won
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.857-866
    • /
    • 2006
  • Performance of an image retrieval system is usually very low because of the semantic gap between the low level feature and the high level concept in a query image. Semantically relevant images may exhibit very different visual characteristics, and may be scattered in several clusters. In this paper, we propose a content based image rertrieval approach which combines region based image retrieval and a new relevance feedback method using adaptive clustering together. Our main goal is finding semantically related clusters to narrow down the semantic gap. Our method consists of region based clustering processes and cluster-merging process. All segmented regions of relevant images are organized into semantically related hierarchical clusters, and clusters are merged by finding the number of the latent clusters. This method, in the cluster-merging process, applies r: using v principal components instead of classical Hotelling's $T_v^2$ [1] to find the unknown number of clusters and resolve the singularity problem in high dimensions and demonstrate that there is little difference between the performance of $T^2$ and that of $T_v^2$. Experiments have demonstrated that the proposed approach is effective in improving the performance of an image retrieval system.

Signature-based Indexing Scheme for Similar Sub-Trajectory Retrieval of Moving Objects (이동 객체의 유사 부분궤적 검색을 위한 시그니쳐-기반 색인 기법)

  • Shim, Choon-Bo;Chang, Jae-Woo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.247-258
    • /
    • 2004
  • Recently, there have been researches on storage and retrieval technique of moving objects, which are highly concerned by user in database application area such as video databases, spatio-temporal databases, and mobile databases. In this paper, we propose a new signature-based indexing scheme which supports similar sub-trajectory retrieval at well as good retrieval performance on moving objects trajectories. Our signature-based indexing scheme is classified into concatenated signature-based indexing scheme for similar sub-trajectory retrieval, entitled CISR scheme and superimposed signature-based indexing scheme for similar sub-trajectory retrieval, entitled SISR scheme according to generation method of trajectory signature based on trajectory data of moving object. Our indexing scheme can improve retrieval performance by reducing a large number of disk access on data file because it first scans all signatures and does filtering before accessing the data file. In addition, we can encourage retrieval efficiency by appling k-warping algorithm to measure the similarity between query trajectory and data trajectory. Final]y, we evaluate the performance on sequential scan method(SeqScan), CISR scheme, and SISR scheme in terms of data insertion time, retrieval time, and storage overhead. We show from our experimental results that both CISR scheme and SISR scheme are better than sequential scan in terms of retrieval performance and SISR scheme is especially superior to the CISR scheme.

Design and Development of Middleware for Clinical Trial System based on Brain MR Image (뇌 MR 영상기반 임상연구 시스템을 위한 미들웨어 설계 및 개발)

  • Jeon, Woong-Gi;Park, Kyoung-Jong;Lee, Young-Seung;Choi, Hyun-Ju;Jeong, Sang-Wook;Kim, Dong-Eog;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.6
    • /
    • pp.805-813
    • /
    • 2012
  • In this paper, we have designed and developed a middleware for an effectively approaching database to the existed brain disease clinical research system. The brain disease clinical research system was consisted of two parts i.e., a register and an analyzer. Since the register collects the registration data the analyzer yields a statistical data which based on the diverse variables. The middleware has designed to database management and a large data query processing of clients. By separating the function of each feature as a module, the module which was weakened connectivity between functionalities has been implemented the re-use module. And image data module used a new compression method from image to text for an effective management and storage in database. We tested the middleware system using 700 actual clinical medical data. As a result, the total data transmission time was improved maximum 115 times faster than the existing one. Through the improved module structures, it is possible to provide a robust and reliable system operation and enhanced security functionality. In the future, these middleware importances should be increased to the large medical database constructions.

Efficient Rotation-Invariant Boundary Image Matching Using the Envelope-based Lower Bound (엔빌로프 기반 하한을 사용한 효율적인 회전-불변 윤곽선 이미지 매칭)

  • Kim, Sang-Pil;Moon, Yang-Sae;Hong, Sun-Kyong
    • The KIPS Transactions:PartD
    • /
    • v.18D no.1
    • /
    • pp.9-22
    • /
    • 2011
  • In this paper we present an efficient solution to rotation?invariant boundary image matching. Computing the rotation-invariant distance between image time-series is a time-consuming process since it requires a lot of Euclidean distance computations for all possible rotations. In this paper we propose a novel solution that significantly reduces the number of distance computations using the envelope-based lower bound. To this end, we first present how to construct a single envelope from a query sequence and how to obtain a lower bound of the rotation-invariant distance using the envelope. We then show that the single envelope-based lower bound can reduce a number of distance computations. This approach, however, may cause bad performance since it may incur a larger lower bound by considering all possible rotated sequences in a single envelope. To solve this problem, we present a concept of rotation interval, and using the rotation interval we generalize the envelope-based lower bound by exploiting multiple envelopes rather than a single envelope. We also propose equi-width and envelope minimization divisions as the method of determining rotation intervals in the multiple envelope approach. Experimental results show that our envelope-based solutions outperform existing solutions by one or two orders of magnitude.