• Title/Summary/Keyword: query patterns

Search Result 104, Processing Time 0.024 seconds

Optimal Construction of Multiple Indexes for Time-Series Subsequence Matching (시계열 서브시퀀스 매칭을 위한 최적의 다중 인덱스 구성 방안)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Park, Hee-Jin
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.201-213
    • /
    • 2006
  • A time-series database is a set of time-series data sequences, each of which is a list of changing values of the object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence from a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We argue that index interpolation is fairly useful to resolve this problem. The index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their inherent sizes. For index interpolation, we first decide the sites of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes in the perspective of physical database design. For this, given a set of query sequences to be peformed in a target time-series database and a set of window sizes for building multiple indexes, we devise a formula that estimates the cost of all the subsequence matchings. Based on this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally Prove the optimality as well as the effectiveness of the algorithm. Finally, we perform a series of extensive experiments with a real-life stock data set and a large volume of a synthetic data set. The results reveal that the proposed approach improves the previous one by 1.5 to 7.8 times.

Applying an Aggregate Function AVG to OLAP Cubes (OLAP 큐브에서의 집계함수 AVG의 적용)

  • Lee, Seung-Hyun;Lee, Duck-Sung;Choi, In-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.1
    • /
    • pp.217-228
    • /
    • 2009
  • Data analysis applications typically aggregate data across many dimensions looking for unusual patterns in data. Even though such applications are usually possible with standard structured query language (SQL) queries, the queries may become very complex. A complex query may result in many scans of the base table, leading to poor performance. Because online analytical processing (OLAP) queries are usually complex, it is desired to define a new operator for aggregation, called the data cube or simply cube. Data cube supports OLAP tasks like aggregation and sub-totals. Many aggregate functions can be used to construct a data cube. Those functions can be classified into three categories, the distributive, the algebraic, and the holistic. It has been thought that the distributive functions such as SUM, COUNT, MAX, and MIN can be used to construct a data cube, and also the algebraic function such as AVG can be used if the function is replaced to an intermediate function. It is believed that even though AVG is not distributive, but the intermediate function (SUM, COUNT) is distributive, and AVG can certainly be computed from (SUM, COUNT). In this paper, however, it is found that the intermediate function (SUM COUNT) cannot be applied to OLAP cubes, and consequently the function leads to erroneous conclusions and decisions. The objective of this study is to identify some problems in applying aggregate function AVG to OLAP cubes, and to design a process for solving these problems.

Extracting Maximal Similar Paths between Two XML Documents using Sequential Pattern Mining (순차 패턴 마이닝을 사용한 두 XML 문서간 최대 유사 경로 추출)

  • 이정원;박승수
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.553-566
    • /
    • 2004
  • Some of the current main research areas involving techniques related to XML consist of storing XML documents, optimizing the query, and indexing. As such we may focus on the set of documents that are composed of various structures, but that are not shared with common structure such as the same DTD or XML Schema. In the case, it is essential to analyze structural similarities and differences among many documents. For example, when the documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure for the process of handling documents. In this paper, we transformed sequential pattern mining algorithms(1) to extract maximal similar paths between two XML documents. Experiments with XML documents show that our transformed sequential pattern mining algorithms can exactly find common structures and maximal similar paths between them. For analyzing experimental results, similarity metrics based on maximal similar paths can exactly classify the types of XML documents.

Efficient Processing method of OLAP Range-Sum Queries in a dynamic warehouse environment (다이나믹 데이터 웨어하우스 환경에서 OLAP 영역-합 질의의 효율적인 처리 방법)

  • Chun, Seok-Ju;Lee, Ju-Hong
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.427-438
    • /
    • 2003
  • In a data warehouse, users typically search for trends, patterns, or unusual data behaviors by issuing queries interactively. The OLAP range-sum query is widely used in finding trends and in discovering relationships among attributes in the data warehouse. In a recent environment of enterprises, data elements in a data cube are frequently changed. The problem is that the cost of updating a prefix sum cube is very high. In this paper, we propose a novel algorithm which reduces the update cost significantly by an index structure called the Δ-tree. Also, we propose a hybrid method to provide either approximate or precise results to reduce the overall cost of queries. It is highly beneficial for various applications that need quick approximate answers rather than time consuming accurate ones, such as decision support systems. An extensive experiment shows that our method performs very efficiently on diverse dimensionalities, compared to other methods.

Uncertainty for Privacy and 2-Dimensional Range Query Distortion

  • Sioutas, Spyros;Magkos, Emmanouil;Karydis, Ioannis;Verykios, Vassilios S.
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.3
    • /
    • pp.210-222
    • /
    • 2011
  • In this work, we study the problem of privacy-preservation data publishing in moving objects databases. In particular, the trajectory of a mobile user in a plane is no longer a polyline in a two-dimensional space, instead it is a two-dimensional surface of fixed width $2A_{min}$, where $A_{min}$ defines the semi-diameter of the minimum spatial circular extent that must replace the real location of the mobile user on the XY-plane, in the anonymized (kNN) request. The desired anonymity is not achieved and the entire system becomes vulnerable to attackers, since a malicious attacker can observe that during the time, many of the neighbors' ids change, except for a small number of users. Thus, we reinforce the privacy model by clustering the mobile users according to their motion patterns in (u, ${\theta}$) plane, where u and ${\theta}$ define the velocity measure and the motion direction (angle) respectively. In this case, the anonymized (kNN) request looks up neighbors, who belong to the same cluster with the mobile requester in (u, ${\theta}$) space: Thus, we know that the trajectory of the k-anonymous mobile user is within this surface, but we do not know exactly where. We transform the surface's boundary poly-lines to dual points and we focus on the information distortion introduced by this space translation. We develop a set of efficient spatiotemporal access methods and we experimentally measure the impact of information distortion by comparing the performance results of the same spatiotemporal range queries executed on the original database and on the anonymized one.

Physical Database Design for DFT-Based Multidimensional Indexes in Time-Series Databases (시계열 데이터베이스에서 DFT-기반 다차원 인덱스를 위한 물리적 데이터베이스 설계)

  • Kim, Sang-Wook;Kim, Jin-Ho;Han, Byung-ll
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1505-1514
    • /
    • 2004
  • Sequence matching in time-series databases is an operation that finds the data sequences whose changing patterns are similar to that of a query sequence. Typically, sequence matching hires a multi-dimensional index for its efficient processing. In order to alleviate the dimensionality curse problem of the multi-dimensional index in high-dimensional cases, the previous methods for sequence matching apply the Discrete Fourier Transform(DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes of the multi-dimensional index. This paper first points out the problems in such simple methods taking the firs two or three coefficients, and proposes a novel solution to construct the optimal multi -dimensional index. The proposed method analyzes the characteristics of a target database, and identifies the organizing attributes having the best discrimination power based on the analysis. It also determines the optimal number of organizing attributes for efficient sequence matching by using a cost model. To show the effectiveness of the proposed method, we perform a series of experiments. The results show that the Proposed method outperforms the previous ones significantly.

  • PDF

Design of Lazy Classifier based on Fuzzy k-Nearest Neighbors and Reconstruction Error (퍼지 k-Nearest Neighbors 와 Reconstruction Error 기반 Lazy Classifier 설계)

  • Roh, Seok-Beom;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.101-108
    • /
    • 2010
  • In this paper, we proposed a new lazy classifier with fuzzy k-nearest neighbors approach and feature selection which is based on reconstruction error. Reconstruction error is the performance index for locally linear reconstruction. When a new query point is given, fuzzy k-nearest neighbors approach defines the local area where the local classifier is available and assigns the weighting values to the data patterns which are involved within the local area. After defining the local area and assigning the weighting value, the feature selection is carried out to reduce the dimension of the feature space. When some features are selected in terms of the reconstruction error, the local classifier which is a sort of polynomial is developed using weighted least square estimation. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods such as standard neural networks, support vector machine, linear discriminant analysis, and C4.5 trees.

Strategies to Improve Nutrition for the Elderly in Suwon : Analysis of Dietary Behavior and Food Preferences (수원지역 노인 영양개선 전략 연구 : 식습관 및 식품기호도 분석)

  • 임경숙;민영희;이태영;김영주
    • Korean Journal of Community Nutrition
    • /
    • v.3 no.3
    • /
    • pp.410-422
    • /
    • 1998
  • To promote health status, strategies and interventions to improve nutrition should be based on the proper diagnosis of the subject's eating patterns. The elderly usually have traditional food habits and preferences, and it is very difficult to change them. This study was designed to identify dietary behavior and food preference of the elderly, in order to provide baseline data for the Elderly Nutrition Intervention Program for the Public Health Center. A survey questionnaire was made for use by trained interviewers to query 151elderly people from 5 community elderly centers located in Suwon, Korea. The majority of them ate regularly and partook of all available side dishes. Their major dietary problems were frequent consumptions of salty foods, and eating too quickly. They consumed grains and vegetables regularly, but seldomly ate dairy products, fruits, meat and food prepared with oil. They also tended to eschew ready made processed food, high cholesterol food, and fast food. Also they did not dine out as much as younger people. Desirable eating habit score were not significantly influenced by socioeconomic variables and nutrition-related characteristics. These included nutrition knowledge, Nutritional Risk Index(NRI) and a score of health concerns. However, meal balance scores were significantly higher in the younger group(p<.05), the higher household income group(p<.05). According to stepwise multiple regression analysis, NRI was the most important determinant of a desirable eating habit score for the male elderly, whereas the score of health concerns was mo9st important for female elderly subjects. The greatest predictor of the meal f balance score was nutrition knowledge. The elderly liked sweet tasting food, grains, rice, stews and Korean style soups. They disliked sour food, dairy products, processed food, and bread. The results indicate that the Elderly Nutrition Education Program should focus on increasing consumption of dairy products, fruits and food with oil, prepared by traditional Korean cooking methods. It also suggests that the program planning should consider the socioeconomic status of the elderly, such as income and education level, as well as concern for health.

  • PDF

Analysis of 『Jinguiyaolue』 Prescriptions using Database (데이터베이스를 이용한 『금궤요략』 처방(處方) 분석 연구)

  • Kim, SeongHo;Kim, SungWon;Kim, KiWook;Lee, ByungWook
    • Journal of Korean Medical classics
    • /
    • v.32 no.3
    • /
    • pp.131-146
    • /
    • 2019
  • Objectives : The aim of this paper is to study the methodology for effectively analyzing the "Jinguiyaolue" prescriptions using database, and to explore possibilities of applying the data construction and query produced in the process to comparative research with other texts in the future. Methods : Using "Xinbianzhongjingquanshu(新編仲景全書)" as original script, the contents of "Jinguiyaolue" were entered into the database, in which one verse would be separated according to content for individual usage. Also, data with medicinal construction and disease pattern information of the previously constructed "Shanghanlun" database designed for comparison with other texts was applied for comparative analysis. Results : For input and analysis, 6 tables and 12 queries were made and used. Formulas were accessible by using herbal combinations, and applications of these formulas could be assembled for comparison. Formulas were also accessible by using disease pattern combinations, and combinations of herbs and disease pattern together were also applicable. In comparison with other texts, examples and frequency of usage of herbs could be relatively accurately compared, while disease patterns could not easily be compared. Conclusions : Herbal combinations, disease pattern combinations could yield related texts and herb/disease pattern combinations of the prescriptions in the "Jinguiyaolue", which shortened time needed for research among formulas in texts. However, standardization research for disease pattern is necessary for a more accurate comparative study that includes disease pattern information.

Does the general public have concerns with dental anesthetics?

  • Razon, Jonathan;Mascarenhas, Ana Karina
    • Journal of Dental Anesthesia and Pain Medicine
    • /
    • v.21 no.2
    • /
    • pp.113-118
    • /
    • 2021
  • Background: Consumers and patients in the last two decades have increasingly turned to various internet search engines including Google for information. Google Trends records searches done using the Google search engine. Google Trends is free and provides data on search terms and related queries. One recent study found a large public interest in "dental anesthesia". In this paper, we further explore this interest in "dental anesthesia" and assess if any patterns emerge. Methods: In this study, Google Trends and the search term "dental pain" was used to record the consumer's interest over a five-year period. Additionally, using the search term "Dental anesthesia," a top ten related query list was generated. Queries are grouped into two sections, a "top" category and a "rising" category. We then added additional search term such as: wisdom tooth anesthesia, wisdom tooth general anesthesia, dental anesthetics, local anesthetic, dental numbing, anesthesia dentist, and dental pain. From the related queries generated from each search term, repeated themes were grouped together and ranked according to the total sum of their relative search frequency (RSF) values. Results: Over the five-year time period, Google Trends data show that there was a 1.5% increase in the search term "dental pain". Results of the related queries for dental anesthesia show that there seems to be a large public interest in how long local anesthetics last (Total RSF = 231) - even more so than potential side effects or toxicities (Total RSF = 83). Conclusion: Based on these results it is recommended that clinicians clearly advice their patients on how long local anesthetics last to better manage patient expectations.