• Title/Summary/Keyword: interval-valued data

Search Result 13, Processing Time 0.028 seconds

Local linear regression analysis for interval-valued data

  • Jang, Jungteak;Kang, Kee-Hoon
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.365-376
    • /
    • 2020
  • Interval-valued data, a type of symbolic data, is given as an interval in which the observation object is not a single value. It can also occur frequently in the process of aggregating large databases into a form that is easy to manage. Various regression methods for interval-valued data have been proposed relatively recently. In this paper, we introduce a nonparametric regression model using the kernel function and a nonlinear regression model for the interval-valued data. We also propose applying the local linear regression model, one of the nonparametric methods, to the interval-valued data. Simulations based on several distributions of the center point and the range are conducted using each of the methods presented in this paper. Various conditions confirm that the performance of the proposed local linear estimator is better than the others.

Calculating Attribute Values using Interval-valued Fuzzy Sets in Fuzzy Object-oriented Data Models (퍼지객체지향자료모형에서 구간값 퍼지집합을 이용한 속성값 계산)

  • Cho Sang-Yeop;Lee Jong-Chan
    • Journal of Internet Computing and Services
    • /
    • v.4 no.4
    • /
    • pp.45-51
    • /
    • 2003
  • In general, the values for attribute appearing in fuzzy object-oriented data models are represented by the fuzzy sets. If it can allow the attribute values in the fuzzy object-oriented data models to be represented by the interval-valued fuzzy sets, then it can allow the fuzzy object-oriented data models to represent the attribute values in more flexible manner. The attribute values of frames appearing in the inheritance structure of the fuzzy object-oriented data models are calculated by a prloritized conjunction operation using interval-valued fuzzy sets. This approach can be applied to knowledge and information processing in which degree of membership is represented as not the conventional fuzzy sets but the interval-valued fuzzy sets.

  • PDF

On principal component analysis for interval-valued data (구간형 자료의 주성분 분석에 관한 연구)

  • Choi, Soojin;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.61-74
    • /
    • 2020
  • Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method.

Testing for stochastic order in interval-valued data (구간 자료의 확률적 순서 검정)

  • Choi, Hyejeong;Lim, Johan;Kwak, Minjung;Park, Seongoh
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.6
    • /
    • pp.879-887
    • /
    • 2019
  • We construct a procedure to test the stochastic order of two samples of interval-valued data. We propose a test statistic that belongs to a U-statistic and derive its asymptotic distribution under the null hypothesis. We compare the performance of the newly proposed method with the existing one-sided bivariate Kolmogorov-Smirnov test using real data and simulated data.

Forecasting Symbolic Candle Chart-Valued Time Series

  • Park, Heewon;Sakaori, Fumitake
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.6
    • /
    • pp.471-486
    • /
    • 2014
  • This study introduces a new type of symbolic data, a candle chart-valued time series. We aggregate four stock indices (i.e., open, close, highest and lowest) as a one data point to summarize a huge amount of data. In other words, we consider a candle chart, which is constructed by open, close, highest and lowest stock indices, as a type of symbolic data for a long period. The proposed candle chart-valued time series effectively summarize and visualize a huge data set of stock indices to easily understand a change in stock indices. We also propose novel approaches for the candle chart-valued time series modeling based on a combination of two midpoints and two half ranges between the highest and the lowest indices, and between the open and the close indices. Furthermore, we propose three types of sum of square for estimation of the candle chart valued-time series model. The proposed methods take into account of information from not only ordinary data, but also from interval of object, and thus can effectively perform for time series modeling (e.g., forecasting future stock index). To evaluate the proposed methods, we describe real data analysis consisting of the stock market indices of five major Asian countries'. We can see thorough the results that the proposed approaches outperform for forecasting future stock indices compared with classical data analysis.

Multi-Interval Discretization of Continuous-Valued Attributes for Constructing Incremental Decision Tree (증분 의사결정 트리 구축을 위한 연속형 속성의 다구간 이산화)

  • Baek, Jun-Geol;Kim, Chang-Ouk;Kim, Sung-Shick
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.4
    • /
    • pp.394-405
    • /
    • 2001
  • Since most real-world application data involve continuous-valued attributes, properly addressing the discretization process for constructing a decision tree is an important problem. A continuous-valued attribute is typically discretized during decision tree generation by partitioning its range into two intervals recursively. In this paper, by removing the restriction to the binary discretization, we present a hybrid multi-interval discretization algorithm for discretizing the range of continuous-valued attribute into multiple intervals. On the basis of experiment using semiconductor etching machine, it has been verified that our discretization algorithm constructs a more efficient incremental decision tree compared to previously proposed discretization algorithms.

  • PDF

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.3
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Discretization of Continuous-Valued Attributes considering Data Distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

  • Lee, Sang-Hoon;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.4
    • /
    • pp.391-396
    • /
    • 2003
  • This paper proposes a new approach that converts continuous-valued attributes to categorical-valued ones considering the distribution of target attributes(classes). In this approach, It can be possible to get optimal interval boundaries by considering the distribution of data itself without any requirements of parameters. For each attributes, the distribution of target attributes is projected to one-dimensional space. And this space is clustered according to the criteria like as the density value of each target attributes and the amount of overlapped areas among each density values of target attributes. Clusters which are made in this ways are based on the probabilities that can predict a target attribute of instances. Therefore it has an interval boundaries that minimize a loss of information of original data. An improved performance of proposed discretization method can be validated using C4.5 algorithm and UCI Machine Learning Data Repository data sets.

A FCA-based Classification Approach for Analysis of Interval Data (구간데이터분석을 위한 형식개념분석기반의 분류)

  • Hwang, Suk-Hyung;Kim, Eung-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.19-30
    • /
    • 2012
  • Based on the internet-based infrastructures such as various information devices, social network systems and cloud computing environments, distributed and sharable data are growing explosively. Recently, as a data analysis and mining technique for extracting, analyzing and classifying the inherent and useful knowledge and information, Formal Concept Analysis on binary or many-valued data has been successfully applied in many diverse fields. However, in formal concept analysis, there has been little research conducted on analyzing interval data whose attributes have some interval values. In this paper, we propose a new approach for classification of interval data based on the formal concept analysis. We present the development of a supporting tool(iFCA) that provides the proposed approach for the binarization of interval data table, concept extraction and construction of concept hierarchies. Finally, with some experiments over real-world data sets, we demonstrate that our approach provides some useful and effective ways for analyzing and mining interval data.

Discretization of Continuous-Valued Attributes for Classification Learning (분류학습을 위한 연속 애트리뷰트의 이산화 방법에 관한 연구)

  • Lee, Chang-Hwan
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1541-1549
    • /
    • 1997
  • Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. Our method is context-sensitive in the sense that it takes into account the value of the target attribute. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for experiment. We use back propagation algorithm and ID3 as classification tools to compare the accuracy of our discretization method with that of other methods.

  • PDF