• 제목/요약/키워드: multidimensional index

검색결과 151건 처리시간 0.022초

An Efficient Indexing Structure for Multidimensional Categorical Range Aggregation Query

  • Yang, Jian;Zhao, Chongchong;Li, Chao;Xing, Chunxiao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.597-618
    • /
    • 2019
  • Categorical range aggregation, which is conceptually equivalent to running a range aggregation query separately on multiple datasets, returns the query result on each dataset. The challenge is when the number of dataset is as large as hundreds or thousands, it takes a lot of computation time and I/O. In previous work, only a single dimension of the range restriction has been solved, and in practice, more applications are being used to calculate multiple range restriction statistics. We proposed MCRI-Tree, an index structure designed to solve multi-dimensional categorical range aggregation queries, which can utilize main memory to maximize the efficiency of CRA queries. Specifically, the MCRI-Tree answers any query in $O(nk^{n-1})$ I/Os (where n is the number of dimensions, and k denotes the maximum number of pages covered in one dimension among all the n dimensions during a query). The practical efficiency of our technique is demonstrated with extensive experiments.

Bayesian Methods for Wavelet Series in Single-Index Models

  • Park, Chun-Gun;Vannucci, Marina;Hart, Jeffrey D.
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2005년도 춘계학술대회
    • /
    • pp.83-126
    • /
    • 2005
  • Single-index models have found applications in econometrics and biometrics, where multidimensional regression models are often encountered. Here we propose a nonparametric estimation approach that combines wavelet methods for non-equispaced designs with Bayesian models. We consider a wavelet series expansion of the unknown regression function and set prior distributions for the wavelet coefficients and the other model parameters. To ensure model identifiability, the direction parameter is represented via its polar coordinates. We employ ad hoc hierarchical mixture priors that perform shrinkage on wavelet coefficients and use Markov chain Monte Carlo methods for a posteriori inference. We investigate an independence-type Metropolis-Hastings algorithm to produce samples for the direction parameter. Our method leads to simultaneous estimates of the link function and of the index parameters. We present results on both simulated and real data, where we look at comparisons with other methods.

  • PDF

Posterior Inference in Single-Index Models

  • Park, Chun-Gun;Yang, Wan-Yeon;Kim, Yeong-Hwa
    • Communications for Statistical Applications and Methods
    • /
    • 제11권1호
    • /
    • pp.161-168
    • /
    • 2004
  • A single-index model is useful in fields which employ multidimensional regression models. Many methods have been developed in parametric and nonparametric approaches. In this paper, posterior inference is considered and a wavelet series is thought of as a function approximated to a true function in the single-index model. The posterior inference needs a prior distribution for each parameter estimated. A prior distribution of each coefficient of the wavelet series is proposed as a hierarchical distribution. A direction $\beta$ is assumed with a unit vector and affects estimate of the true function. Because of the constraint of the direction, a transformation, a spherical polar coordinate $\theta$, of the direction is required. Since the posterior distribution of the direction is unknown, we apply a Metropolis-Hastings algorithm to generate random samples of the direction. Through a Monte Carlo simulation we investigate estimates of the true function and the direction.

주택 사업 분석 시스템 구축 : 서울지역 아파트 가격 데이터를 중심으로 (Implementing an Analysis System for Housing Business Based on Seoul Apartment Price Data)

  • 김태훈;이희석;김재윤;전진오;이은식
    • 정보기술과데이타베이스저널
    • /
    • 제6권2호
    • /
    • pp.115-130
    • /
    • 1999
  • The price structure of housing market varies depending upon market price policy rather than low or high price policy because of IMF. The object of this study is to develop an analysis system for analyzing housing market and its demand. The analysis system consists of four major categories: macro index analysis, market decision analysis, housing market analysis, and consumer analysis. We model each category by using a variety of techniques such as generalized linear model, categorical analysis, bubble analysis, drill-down analysis, price sensitivity meter analysis, optimum price index analysis, profit index measurement analysis, correspondence analysis, conjoint analysis, and multidimensional scaling analysis. Seoul apartment data is analyzed to demonstrate the practical usefulness of the system.

  • PDF

A Dimensionality Assessment for Polytomously Scored Items Using DETECT

  • Kim, Hae-Rim
    • Communications for Statistical Applications and Methods
    • /
    • 제7권2호
    • /
    • pp.597-603
    • /
    • 2000
  • A versatile dimensionality assessment index DETECT has been developed for binary item response data by Kim (1994). The present paper extends the use of DETECT to the polytomously scored item data. A simulation study shows DETECT performs well in differentiating multidimensional data from unidimensional one by yielding a greater value of DETECT in the case of multidimensionality. An additional investigation is necessary for the dimensionally meaningful clustering methods, such as HAC for binary data, particularly sensitive to the polytomous data.

  • PDF

최근접 질의를 위한 고차원 인덱싱 방법

  • 김상욱
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제28권4호
    • /
    • pp.632-642
    • /
    • 2001
  • 최근접 질의(nearest neighbor query)는 멀티미디어 데이타베이스에서 주어진 질의 객체와 가장 유사한 객체를 찾기 위한 매우 중요한 연산으로 사용된다. 대부분의 최근접 질의 처리 기법들은 객체의 효과 적인 인덱싱을 위하여 다차원 인덱스(multidimensional index)를 사용한다. 그러나 N차원 사각형 혹은 원을 사용하여 객체 클러스터의 캡슐 표현하는 기존의 다차원 인덱스들은 타원 수가 높아짐에 따라 검색 성능이 크게 떨어진다. 본 논문에서는 이러한 단순한 캡슐 표현 방식이 최근접 질의 처리의 성능을 저하시키는 주요 원인임을 지적하고, (1) 클러스터에 적합한 새로운 축 시스템(axis system)의 채택, (2) 원과 사각형의 조합 에 의한 다양한 캡슐 형태의 표현. (3) 아웃 라이어(outlier)의 별도 관리 등의 해결 방안을 제안한다. 또한, 이러한 개념들을 채택하는 인덱싱 구조를 제시하고. 이를 이용하는 최근접 질의 처리 방안을 제안한다. 끝으 로, 다양한 실험에 의한 성능 평가를 통하여 제안된 기법의 우수성을 검증한다.

  • PDF

타임 워핑을 지원하는 효율적인 서브시퀀스 매칭 기법 (A Subsequence Matching Technique that Supports Time Warping Efficiently)

  • 박상현;김상욱;조준서;이헌길
    • 산업기술연구
    • /
    • 제21권A호
    • /
    • pp.167-179
    • /
    • 2001
  • This paper discusses an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, we suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multi-dimensional index using a feature vector as indexing attributes. For query precessing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verily the superiority of our method, we perform extensive experiments. The results reseal that our method achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.

  • PDF

The Relationships between Benthic Macroinvertebrate and Environmental Factors in Iancheon and Bukcheon Streams, Korea

  • Bae, Mi-Jung;Park, Seon-Min;Kim, Ja-Kyung;Hong, Jeong-Gi;Ryu, Shi Hyun
    • 생태와환경
    • /
    • 제53권1호
    • /
    • pp.22-30
    • /
    • 2020
  • In this study, we investigated the relationships between benthic macroinvertebrate assemblages and various environmental factors in Iancheon (NIA) and Bukcheon (NBC) streams, Korea. We collected benthic macroinvertebrates and 33 environmental factors in April 2017 at 9 sites (5 sites in NIA and 4 sites in NBC). We identified 93 species(5 phyla, 9 classes, 16 orders, and 53 families) and 69 species(5 phyla, 9 classes, 17 orders, and 47 families) in NIA and NBC streams, respectively. Considering benthic macroinvertebrate index (BMI), NIA (88.2) and NBC (80.2) streams were in "very good" status. Upstream areas showed the highest scores, 95.5 (NIA1) and 94.2 (NBC1), whereas BMI score was the lowest in downstream areas of both streams, especially in NBC4 (51.0 "bad" status). Cluster analysis and non-metric multidimensional scaling analysis represented the differences of benthic macroinvertebrate assemblages according to spatial and anthropogenic gradients. Our findings provide reference data and highlight the need for the continued monitoring to maintain the good status and manage macroinvertebrate diversity in these two streams, in Sangju-si, Korea.

중국 주요 50개 도시의 전자상거래 발전성과에 대한 평가 (Evaluation on Development Performances of E-Commerce for 50 Major Cities in China)

  • 정동빈;왕강
    • 유통과학연구
    • /
    • 제14권1호
    • /
    • pp.67-74
    • /
    • 2016
  • Purpose - In this paper, the degree of similarity and dissimilarity between pairs of 50 major cities in China can be shown on the basis of three evaluation variables(internet businessman index, internet shopping index and e-commerce development index). Dissimilarity distance matrix is used to analyze both similarity and dissimilarity between each fifty city in China by calculating dissimilarity as distance. Higher value signifies higher degree of dissimilarity between two cities. Cluster analysis is exploited to classify 50 cities into a number of different groups such that similar cities are placed in the same group. In addition, multidimensional scaling(MDS) technique can obtain visual representation for exploring the pattern of proximities among 50 major cities in China based on three development performance attributes. Research design, data, and methodology - This research is performed by the 2013 report provided with AliResearch in China(1/1/2013~11/30/2013) and utilized multivariate methods such as dissimilarity distance matrix, cluster analysis and MDS by using CLUSTER, KMEANS, PROXIMITIES and ALSCAL procedures in SPSS 21.0. Results - This research applies two types of cluster analysis and MDS on three development performances based on the 2013 report of Aliresearch. As a result, it is confirmed that grouping is possible by categorizing the types into four clusters which share similar characteristics. MDS is exploited to carry out positioning of both grouped locations of cluster and 50 major cities belonging to each cluster. Since all the values corresponding to Shenzhen, Guangzhou and Hangzhou(which belong to cluster 1 among 50 major cities) are very large, these cities are superior to other cities in all three evaluation attributes. Twelve cities(Beijing, ShangHai, Jinghua, ZhuHai, XiaMen, SuZhou, NanJing, DongWan, ZhangShan, JiaXing, NingBo and FoShan), which belong to cluster 3, are inferior to those of cluster 1 in terms of all three attributes, but they can be expected to be the next e-commerce revolution. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three attributes, so that this automatically evokes creative innovation, which leads to e-commerce development as a whole in China. In terms of internet businessman index, on the other hand, Tainan, Taizhong, and Gaoxiong(which belong to cluster 2) are situated superior to others. However, these three cities are inferior to others in an internet shopping index sense. The rest of major cities, in particular, which belong to cluster 4 are relatively inferior in all three evaluation attributes, so that this automatically evokes innovation and entrepreneurship, which leads to e-commerce development as a whole in China. Conclusions - This study suggests the implications to help e-governmental officers and companies make strategies in both Korea and China. This is expected to give some useful information in understanding the recent situation of e-commerce in China, by looking over development performances of 50 major cities. Therefore, we should develop marketing, branding and communication relevant to online Chinese consumers. One of these efforts will be incentives like loyalty points and coupons that can encourage consumers and building in-house logistics networks.

건설공사 자재 관리를 위한 데이터 웨어하우스 개발 (Development of Data Warehouse for Construction Material Management)

  • 류한국
    • 한국건축시공학회지
    • /
    • 제11권3호
    • /
    • pp.319-325
    • /
    • 2011
  • 건설공사는 수많은 자원 중 상당부분이 자재공급업체에서 제공하는 자재의 원활한 공급에 따라 작업의 원활성이 결정된다. 자재를 적기에 조달하고 배분하는 결과에 따라 작업이 순조롭게 진행되고 궁극적으로 소기의 공사기간 내에 공사를 완료할 수 있다. 본 연구는 데이터 웨어하우스 기술을 활용하여 건설공사에서 중요한 자재관리에 활용할 수 있는 방법을 제시하였다. 건설공사의 자재관리를 위해 필요한 자재리드타임, 자재조달비율, 자재설치비율 등에 대한 정보를 다차원적으로 분석하고 KPI를 설정하여 의사결정 정보로 활용할 수 있도록 하였다. 궁극적으로 본 연구는 건설공사의 운영계 시스템에서 발생하는 수많은 자재관련 데이터들을 효과적으로 활용하는 방법을 제시하였다. 즉, 주제 중심적이고 통합적인 데이터를 제공할 수 있는 데이터 웨어하우스 기술을 활용하여 체계적인 자재관리 정보를 제공할 수 있도록 하였다.