• Title/Summary/Keyword: Data selection

Search Result 5,697, Processing Time 0.03 seconds

Bayesian estimation for finite population proportion under selection bias via surrogate samples

  • Choi, Seong Mi;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1543-1550
    • /
    • 2013
  • In this paper, we study Bayesian estimation for the finite population proportion in binary data under selection bias. We use a Bayesian nonignorable selection model to accommodate the selection mechanism. We compare four possible estimators of the finite population proportions based on data analysis as well as Monte Carlo simulation. It turns out that nonignorable selection model might be useful for weekly biased samples.

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

ASVMRT: Materialized View Selection Algorithm in Data Warehouse

  • Yang, Jin-Hyuk;Chung, In-Jeong
    • Journal of Information Processing Systems
    • /
    • v.2 no.2
    • /
    • pp.67-75
    • /
    • 2006
  • In order to acquire a precise and quick response to an analytical query, proper selection of the views to materialize in the data warehouse is crucial. In traditional view selection algorithms, all relations are considered for selection as materialized views. However, materializing all relations rather than a part results in much worse performance in terms of time and space costs. Therefore, we present an improved algorithm for selection of views to materialize using the clustering method to overcome the problem resulting from conventional view selection algorithms. In the presented algorithm, ASVMRT (Algorithm for Selection of Views to Materialize using Reduced Table), we first generate reduced tables in the data warehouse using clustering based on attribute-values density, and then we consider the combination of reduced tables as materialized views instead of a combination of the original base relations. For the justification of the proposed algorithm, we reveal the experimental results in which both time and space costs are approximately 1.8 times better than conventional algorithms.

An Exploration on the Use of Data Envelopment Analysis for Product Line Selection

  • Lin, Chun-Yu;Okudan, Gul E.
    • Industrial Engineering and Management Systems
    • /
    • v.8 no.1
    • /
    • pp.47-53
    • /
    • 2009
  • We define product line (or mix) selection problem as selecting a subset of potential product variants that can simultaneously minimize product proliferation and maintain market coverage. Selecting the most efficient product mix is a complex problem, which requires analyses of multiple criteria. This paper proposes a method based on Data Envelopment Analysis (DEA) for product line selection. Data Envelopment Analysis (DEA) is a linear programming based technique commonly used for measuring the relative performance of a group of decision making units with multiple inputs and outputs. Although DEA has been proved to be an effective evaluation tool in many fields, it has not been applied to solve the product line selection problem. In this study, we construct a five-step method that systematically adopts DEA to solve a product line selection problem. We then apply the proposed method to an existing line of staplers to provide quantitative evidence for managers to generate desirable decisions to maximize the company profits while also fulfilling market demands.

Evaluation of Attribute Selection Methods and Prior Discretization in Supervised Learning

  • Cha, Woon Ock;Huh, Moon Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.879-894
    • /
    • 2003
  • We evaluated the efficiencies of applying attribute selection methods and prior discretization to supervised learning, modelled by C4.5 and Naive Bayes. Three databases were obtained from UCI data archive, which consisted of continuous attributes except for one decision attribute. Four methods were used for attribute selection : MDI, ReliefF, Gain Ratio and Consistency-based method. MDI and ReliefF can be used for both continuous and discrete attributes, but the other two methods can be used only for discrete attributes. Discretization was performed using the Fayyad and Irani method. To investigate the effect of noise included in the database, noises were introduced into the data sets up to the extents of 10 or 20%, and then the data, including those either containing the noises or not, were processed through the steps of attribute selection, discretization and classification. The results of this study indicate that classification of the data based on selected attributes yields higher accuracy than in the case of classifying the full data set, and prior discretization does not lower the accuracy.

Generalization of Road Network using Logistic Regression

  • Park, Woojin;Huh, Yong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.2
    • /
    • pp.91-97
    • /
    • 2019
  • In automatic map generalization, the formalization of cartographic principles is important. This study proposes and evaluates the selection method for road network generalization that analyzes existing maps using reverse engineering and formalizes the selection rules for the road network. Existing maps with a 1:5,000 scale and a 1:25,000 scale are compared, and the criteria for selection of the road network data and the relative importance of each network object are determined and analyzed using $T{\ddot{o}}pfer^{\prime}s$ Radical Law as well as the logistic regression model. The selection model derived from the analysis result is applied to the test data, and road network data for the 1:25,000 scale map are generated from the digital topographic map on a 1:5,000 scale. The selected road network is compared with the existing road network data on the 1:25,000 scale for a qualitative and quantitative evaluation. The result indicates that more than 80% of road objects are matched to existing data.

Performance Analysis of Best Relay Selection in Cooperative Multicast Systems Based on Superposition Transmission (중첩 전송 기반 무선 협력 멀티캐스트 시스템에서 중계 노드 선택 기법에 대한 성능 분석)

  • Lee, In-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.3
    • /
    • pp.520-526
    • /
    • 2018
  • In this paper, considering the superposition transmission-based wireless cooperative multicast communication system (ST-CMS) with multiple relays and destinations, we propose a relay selection scheme to improve the data rate of multicast communication. In addition, we adopt the optimal power allocation coefficient for the superposition transmission to maximize the data rate of the proposed relay selection scheme. To propose the relay selection scheme, we derive an approximate expression for the data rate of the ST-CMS, and present the relay selection scheme using only partial channel state information based on the approximate expression. Moreover, we derive an approximate average data rate of the proposed relay selection scheme. Through numerical investigation, comparing the average data rates of the proposed relay selection scheme and the optimal relay selection scheme using full channel state information, we show that the proposed scheme provides extremely similar performance to the optimal scheme in the high signal-to-noise power ratio region.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

On loss functions for model selection in wavelet based Bayesian method

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1191-1197
    • /
    • 2009
  • Most Bayesian approaches to model selection of wavelet analysis have drawbacks that computational cost is expensive to obtain accuracy for the fitted unknown function. To overcome the drawback, this article introduces loss functions which are criteria for level dependent threshold selection in wavelet based Bayesian methods with arbitrary size and regular design points. We demonstrate the utility of these criteria by four test functions and real data.

  • PDF

Member Selection Procedure in the Steel Structural Design (강구조물설계에서 부재선정의 시스템화 방법론)

  • 이영호;김상철;김흥국;이병해
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1995.10a
    • /
    • pp.197-206
    • /
    • 1995
  • In structural design procedure, The procedure of member selection manages complex data relationship and reflects structural expert's knowledge. It is a difficult problem to construct an effective system with the conventional l programming technique. Knowledge_based s!'stem is a software system capable of supporting the explicit representation of expert's knowledge in member selection process through member data and reasoning mechanisms. This study describes useful methodology for structuring knowledge and representing relation between member data and knowledge. And this study shows the application of this member for member selection in the steel structural design.

  • PDF