• Title/Summary/Keyword: high dimensional data sets

Search Result 72, Processing Time 0.033 seconds

A Dimension Reduction Method for High-Dimensional Image Patterns Using Relational Discriminant Analysis (Relational Discriminant Analysis를 이용한 고차원 영상패턴의 차원축소)

  • Kim, Sang-Woon;Koo, Byum-Yong
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.689-690
    • /
    • 2006
  • Relational discriminant analysis is a way of representing an object based on the dissimilarity measures among the prototypes extracted from feature vectors instead of the vectors themselves. Thus, by appropriately selecting a few number of representatives and by defining the dissimilarity measure, in this paper we propose a method of reducing the dimensionality and getting to achieve a better classification performance in both speed and accuracy. Our experimental results demonstrate that the proposed mechanism increases the performance as compared with the conventional approaches for samples involving artificial data sets.

  • PDF

Compressing Method of NetCDF Files Based on Sparse Matrix (희소행렬 기반 NetCDF 파일의 압축 방법)

  • Choi, Gyuyeun;Heo, Daeyoung;Hwang, Suntae
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.11
    • /
    • pp.610-614
    • /
    • 2014
  • Like many types of scientific data, results from simulations of volcanic ash diffusion are of a clustered sparse matrix in the netCDF format. Since these data sets are large in size, they generate high storage and transmission costs. In this paper, we suggest a new method that reduces the size of the data of volcanic ash diffusion simulations by converting the multi-dimensional index to a single dimension and keeping only the starting point and length of the consecutive zeros. This method presents performance that is almost as good as that of ZIP format compression, but does not destroy the netCDF structure. The suggested method is expected to allow for storage space to be efficiently used by reducing both the data size and the network transmission time.

Analytical study of house wall and air temperature transients under on-off and proportional control for different wall type

  • Han, Kyu-Il
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.46 no.1
    • /
    • pp.70-81
    • /
    • 2010
  • A mathematical model is formulated to study the effect of wall mass on the thermal performance of four different houses of different construction. This analytical study was motivated by the experimental work of Burch et al. An analytical solution of one -dimensional, linear, partial differential equation for wall temperature profiles and room air temperatures is obtained using the Laplace transform method. Typical Meteorological Year data are processed to yield hourly average monthly values. These discrete data are then converted to a continuous, time dependent form using a Fast Fourier Transform method. This study is conducted using weather data from four different locations in the United States: Albuquerque, New mexico; Miami, Florida; Santa Maria, California; and Washington D.C. for both winter and summer conditions. A computer code is developed to calculate the wall temperature profile, room air temperature, and energy consumption loads. Three sets of results are calculated one for no auxiliary energy and two for different control mechanism -- an on-off controller and a proportional controller. Comparisons are made for the cases of two controllers. Heavy weight houses with insulation in mild weather areas (such as August in Santa Maria, California) show a high comfort level. Houses using proportional control experience a higher comfort level in comparison to houses using on-off control. The result shows that there is an effect of mass on the thermal performance of a heavily constructed house in mild weather conditions.

Support Vector Machine for Interval Regression

  • Hong Dug Hun;Hwang Changha
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2004.11a
    • /
    • pp.67-72
    • /
    • 2004
  • Support vector machine (SVM) has been very successful in pattern recognition and function estimation problems for crisp data. This paper proposes a new method to evaluate interval linear and nonlinear regression models combining the possibility and necessity estimation formulation with the principle of SVM. For data sets with crisp inputs and interval outputs, the possibility and necessity models have been recently utilized, which are based on quadratic programming approach giving more diverse spread coefficients than a linear programming one. SVM also uses quadratic programming approach whose another advantage in interval regression analysis is to be able to integrate both the property of central tendency in least squares and the possibilistic property In fuzzy regression. However this is not a computationally expensive way. SVM allows us to perform interval nonlinear regression analysis by constructing an interval linear regression function in a high dimensional feature space. In particular, SVM is a very attractive approach to model nonlinear interval data. The proposed algorithm here is model-free method in the sense that we do not have to assume the underlying model function for interval nonlinear regression model with crisp inputs and interval output. Experimental results are then presented which indicate the performance of this algorithm.

  • PDF

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.

Discrimination between Earthquakes and Explosions Recorded by the KSRS Seismic Array in Wonju, Korea (원주 KSRS 지진 관측망에 기록된 지진과 폭발 식별 연구)

  • Jeong, Seong Ju;Che, Il-Young;Kang, Tae-Seob
    • Geophysics and Geophysical Exploration
    • /
    • v.17 no.3
    • /
    • pp.137-146
    • /
    • 2014
  • This study presents a procedure for discrimination of artificial events from earthquakes occurred in and around the Korean Peninsula using data set in the Wonju KSRS seismograph network, Korea. Two training sets representing natural and artificial earthquakes were constructed with 150 and 56 events, respectively, with high signal to noise ratio. A frequency band, Pg(4-6 Hz)/Lg(5-7 Hz), which is optimal for the discrimination of seismic sources was derived from the two-dimensional grid of Pg/Lg spectral amplitude ratio. The corrections for the effects of earthquake magnitude and hypocentral distance were carried out for improvement of discrimination capability. For correcting the effect of magnitude dependence due to the inverse proportionality of corner frequency to seismic moment, the Brune's source spectrum was subtracted from the observation spectrum. The spectrum was corrected using the optimal damping coefficient to remove damping effect with the hypocentral distance. The effect of locally varying spectrum ratio was cancelled correcting variation of wave propagation along the ray path. The performance in discrimination between training sets of natural and artificial events was compared using the Mahalanobis distance in each step of correction. The procedure of magnitude, distance, and path corrections show clear improvements of the discrimination results with increasing Mahalanobis distance, from 1.98 to 3.01, between two training sets.

Exploring the Performance of Multi-Label Feature Selection for Effective Decision-Making: Focusing on Sentiment Analysis (효과적인 의사결정을 위한 다중레이블 기반 속성선택 방법에 관한 연구: 감성 분석을 중심으로)

  • Jong Yoon Won;Kun Chang Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.47-73
    • /
    • 2023
  • Management decision-making based on artificial intelligence(AI) plays an important role in helping decision-makers. Business decision-making centered on AI is evaluated as a driving force for corporate growth. AI-based on accurate analysis techniques could support decision-makers in making high-quality decisions. This study proposes an effective decision-making method with the application of multi-label feature selection. In this regard, We present a CFS-BR (Correlation-based Feature Selection based on Binary Relevance approach) that reduces data sets in high-dimensional space. As a result of analyzing sample data and empirical data, CFS-BR can support efficient decision-making by selecting the best combination of meaningful attributes based on the Best-First algorithm. In addition, compared to the previous multi-label feature selection method, CFS-BR is useful for increasing the effectiveness of decision-making, as its accuracy is higher.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

Seismic Tomography using Graph Theoretical Ray Tracing

  • Keehm, Young-Seuk;Baag, Chang-Eob;Lee, Jung-Mo
    • International Union of Geodesy and Geophysics Korean Journal of Geophysical Research
    • /
    • v.25 no.1
    • /
    • pp.23-34
    • /
    • 1997
  • Seismic tomography using the graph theoretical method of ray tracing is performed in two synthetic data sets with laterally varying velocity structures. The straight-ray tomography shows so poor results in imaging the laterally varying velocity structure that the ray-traced tomographic techniques should be used. Conventional ray tracing methods have serious drawbacks, i.e. problems of convergence and local minima, when they are applied to seismic tomography. The graph theretical method finds good approximated raypaths in rapidly varying media even in shadow zones, where shooting methods meet with convergence problems. The graph theoretical method ensures the globally minimal traveltime raypath while bending methods often cause local minima problems. Especially, the graph theoretical method is efficient in case that many sources and receivers exist, since it can find the traveltimes and corresponding raypaths to all receivers from a specific source at one time. Moreover, the algorithm of graph theoretical method is easily applicable to the ray tracing in anisotropic media, and even to the three dimensional case. Among the row-active inversion techniques, the conjugate gradient (CG) method is used because of fast convergence and high efficiency. The iterative sequence of the ray tracing by the graph theoretical method and the inversion by the CG method is an efficient and robust algorithm for seismic tomography in laterally varying velocity structures.

  • PDF

An Enhanced Neural Network Approach for Numeral Recognition

  • Venugopal, Anita;Ali, Ashraf
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.61-66
    • /
    • 2022
  • Object classification is one of the main fields in neural networks and has attracted the interest of many researchers. Although there have been vast advancements in this area, still there are many challenges that are faced even in the current era due to its inefficiency in handling large data, linguistic and dimensional complexities. Powerful hardware and software approaches in Neural Networks such as Deep Neural Networks present efficient mechanisms and contribute a lot to the field of object recognition as well as to handle time series classification. Due to the high rate of accuracy in terms of prediction rate, a neural network is often preferred in applications that require identification, segmentation, and detection based on features. Neural networks self-learning ability has revolutionized computing power and has its application in numerous fields such as powering unmanned self-driving vehicles, speech recognition, etc. In this paper, the experiment is conducted to implement a neural approach to identify numbers in different formats without human intervention. Measures are taken to improve the efficiency of the machines to classify and identify numbers. Experimental results show the importance of having training sets to achieve better recognition accuracy.