• Title/Summary/Keyword: data reduction

Search Result 6,241, Processing Time 0.041 seconds

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Novel Intent based Dimension Reduction and Visual Features Semi-Supervised Learning for Automatic Visual Media Retrieval

  • kunisetti, Subramanyam;Ravichandran, Suban
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.230-240
    • /
    • 2022
  • Sharing of online videos via internet is an emerging and important concept in different types of applications like surveillance and video mobile search in different web related applications. So there is need to manage personalized web video retrieval system necessary to explore relevant videos and it helps to peoples who are searching for efficient video relates to specific big data content. To evaluate this process, attributes/features with reduction of dimensionality are computed from videos to explore discriminative aspects of scene in video based on shape, histogram, and texture, annotation of object, co-ordination, color and contour data. Dimensionality reduction is mainly depends on extraction of feature and selection of feature in multi labeled data retrieval from multimedia related data. Many of the researchers are implemented different techniques/approaches to reduce dimensionality based on visual features of video data. But all the techniques have disadvantages and advantages in reduction of dimensionality with advanced features in video retrieval. In this research, we present a Novel Intent based Dimension Reduction Semi-Supervised Learning Approach (NIDRSLA) that examine the reduction of dimensionality with explore exact and fast video retrieval based on different visual features. For dimensionality reduction, NIDRSLA learns the matrix of projection by increasing the dependence between enlarged data and projected space features. Proposed approach also addressed the aforementioned issue (i.e. Segmentation of video with frame selection using low level features and high level features) with efficient object annotation for video representation. Experiments performed on synthetic data set, it demonstrate the efficiency of proposed approach with traditional state-of-the-art video retrieval methodologies.

Comparison of the Performance of Clustering Analysis using Data Reduction Techniques to Identify Energy Use Patterns

  • Song, Kwonsik;Park, Moonseo;Lee, Hyun-Soo;Ahn, Joseph
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.559-563
    • /
    • 2015
  • Identification of energy use patterns in buildings has a great opportunity for energy saving. To find what energy use patterns exist, clustering analysis has been commonly used such as K-means and hierarchical clustering method. In case of high dimensional data such as energy use time-series, data reduction should be considered to avoid the curse of dimensionality. Principle Component Analysis, Autocorrelation Function, Discrete Fourier Transform and Discrete Wavelet Transform have been widely used to map the original data into the lower dimensional spaces. However, there still remains an ongoing issue since the performance of clustering analysis is dependent on data type, purpose and application. Therefore, we need to understand which data reduction techniques are suitable for energy use management. This research aims find the best clustering method using energy use data obtained from Seoul National University campus. The results of this research show that most experiments with data reduction techniques have a better performance. Also, the results obtained helps facility managers optimally control energy systems such as HVAC to reduce energy use in buildings.

  • PDF

CAD Model Generation from Point Clouds using 3D Grid Method (Grid 방법을 이용한 측정 점데이터로부터의 CAD모델 생성에 관한 연구)

  • 우혁제;강의철;이관행
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2001.04a
    • /
    • pp.435-438
    • /
    • 2001
  • Reverse engineering technology refers to the process that creates a CAD model of an existing part using measuring devices. Recently, non-contact scanning devices have become more accurate and the speed of data acquisition has increased drastically. However, they generate thousands of points per second and various types of point data. Therefore, it becomes a major issue to handle the huge amount and various types of point data. To generate a CAD model from scanned point data efficiently, these point data should be well arranged through point data handling processes such as data reduction and segmentation. This paper proposes a new point data handling method using 3D grids. The geometric information of a part is extracted from point cloud data by estimating normal values of the points. The non-uniform 3D grids for data reduction and segmentation are generated based on the geometric information. Through these data reduction and segmentation processes, it is possible to create CAD models autmatically and efficiently. The proposed method is applied to two quardric medels and the results are discussed.

  • PDF

Bit Error Reduction for Holographic Data Storage System Using Subclustering (서브클러스터링을 이용한 홀로그래픽 정보저장 시스템의 비트 에러 보정 기법)

  • Kim, Sang-Hoon;Yang, Hyun-Seok;Park, Young-Pil
    • Transactions of the Society of Information Storage Systems
    • /
    • v.6 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • Data storage related with writing and retrieving requires high storage capacity, fast transfer rate and less access time. Today any data storage system cannot satisfy these conditions, however holographic data storage system can perform faster data transfer rate because it is a page oriented memory system using volume hologram in writing and retrieving data. System can be constructed without mechanical actuating part so fast data transfer rate and high storage capacity about 1Tb/cm3 can be realized. In this research, to correct errors of binary data stored in holographic data storage system, a new method for reduction errors is suggested. First, find cluster centers using subtractive clustering algorithm then reduce intensities of pixels around cluster centers. By using this error reduction method following results are obtained ; the effect of Inter Pixel Interference noise in the holographic data storage system is decreased and the intensity profile of data page becomes uniform therefore the better data storage system can be constructed.

Evaluation of Multivariate Stream Data Reduction Techniques (다변량 스트림 데이터 축소 기법 평가)

  • Jung, Hung-Jo;Seo, Sung-Bo;Cheol, Kyung-Joo;Park, Jeong-Seok;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.889-900
    • /
    • 2006
  • Even though sensor networks are different in user requests and data characteristics depending on each application area, the existing researches on stream data transmission problem focus on the performance improvement of their methods rather than considering the original characteristic of stream data. In this paper, we introduce a hierarchical or distributed sensor network architecture and data model, and then evaluate the multivariate data reduction methods suitable for user requirements and data features so as to apply reduction methods alternatively. To assess the relative performance of the proposed multivariate data reduction methods, we used the conventional techniques, such as Wavelet, HCL(Hierarchical Clustering), Sampling and SVD (Singular Value Decomposition) as well as the experimental data sets, such as multivariate time series, synthetic data and robot execution failure data. The experimental results shows that SVD and Sampling method are superior to Wavelet and HCL ia respect to the relative error ratio and execution time. Especially, since relative error ratio of each data reduction method is different according to data characteristic, it shows a good performance using the selective data reduction method for the experimental data set. The findings reported in this paper can serve as a useful guideline for sensor network application design and construction including multivariate stream data.

Work load analysis for determination of the reduction gear ratio for a 78 kW all wheel drive electric tractor design

  • Kim, Wan-Soo;Baek, Seung-Yun;Kim, Taek-Jin;Kim, Yeon-Soo;Park, Seong-Un;Choi, Chang-Hyun;Hong, Soon-Jung;Kim, Yong-Joo
    • Korean Journal of Agricultural Science
    • /
    • v.46 no.3
    • /
    • pp.613-627
    • /
    • 2019
  • The purpose of this study was to design a powertrain for a 78 kW AWD (all wheel drive) electric tractor by analyzing the combination of various reduction gear ratios on a commercial motor using data from actual agricultural work and driving conditions. A load measurement system was constructed to collect data using wheel torque meters, proximity sensors, and a data acquisition system. Field experiments for measuring load data were performed for two environmental driving conditions (on asphalt and soil) and four agricultural operations (plow tillage, rotary tillage, loader operation, and baler operation). The attached implements and gear stages were selected through farmer surveys. The range of the reduction ratio was determined by selecting the minimum reduction ratio needed to satisfy the torque condition required for agricultural operations and the maximum reduction gear ratio to satisfy the maximum travel speed. The minimum reduction gear ratio selected was 57 in consideration of the working load condition and the maximum reduction gear ratio selected was 62 considering the maximum running speed. In the range of the reduction gear ratio 57 - 62, the selected motor satisfied all working torque conditions. As a result, the combination of the selected motor and reduction gear ratio was applicable for satisfying the loads required during agricultural operation and driving operation.

Trading Risk Reduction Effects for Currency Futures Markets (통화선물거래의 거래위험 감소효과에 관한 연구)

  • Choi, Heung Sik;Kim, Sun Woong;Park, Eun-Jin
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.4
    • /
    • pp.1-13
    • /
    • 2014
  • This study aims to show the risk reduction effects of round-the-clock trading environment. We analyse the trading results of the currency futures contracts in CME Globex which are open 23 hours a day. These include Euro FX, Japanese Yen, Australian Dollar, and British Pound from January 2005 to August 2013. We generate new price series using only daytime prices during about 7-hour period. This hypothetical "G" data series may have greater gap risk than the original "R" data series. Empirical results show the trading risk reduction effects, that is R data series have higher profits and lower risks than G data series.

A Three Steps Data Reduction Model for Healthcare Systems (헬스케어 시스템을 위한 세단계 데이터 축소 모델)

  • Ali, Rahman;Lee, Sungyoung;Chung, Tae Choong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.474-475
    • /
    • 2013
  • In healthcare systems, the accuracy of a classifier for classifying medical diseases depends on a reduced dataset. Key to achieve true classification results is the reduction of data to a set of optimal number of significant features. The initial step towards data reduction is the integration of heterogeneous data sources to a unified reduced dataset which is further reduced by considering the range of values of all the attributes and then finally filtering and dropping out the least significant features from the dataset. This paper proposes a three step data reduction model which plays a vital role in the classification process.