• Title/Summary/Keyword: Sample selection model

Search Result 198, Processing Time 0.024 seconds

Wavelength selection by loading vector analysis in determining total protein in human serum using near-infrared spectroscopy and Partial Least Squares Regression

  • Kim, Yoen-Joo;Yoon, Gil-Won
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.4102-4102
    • /
    • 2001
  • In multivariate analysis, absorbance spectrum is measured over a band of wavelengths. One does not often pay attention to the size of this wavelength band. However, it is desirable that spectrum is measured at only necessary wavelengths as long as the acceptable accuracy of prediction can be met. In this paper, the method of selecting an optimal band of wavelengths based on the loading vector analysis was proposed and applied for determining total protein in human serum using near-infrared transmission spectroscopy and PLSR. Loading vectors in the full spectrum PLSR were used as reference in selecting wavelengths, but only the first loading vector was used since it explains the spectrum best. Absorbance spectra of sera from 97 outpatients were measured at 1530∼1850 nm with an interval of 2 nm. Total protein concentrations of sera were ranged from 5.1 to 7.7 g/㎗. Spectra were measured by Cary 5E spectrophotometer (Varian, Australia). Serum in the 5 mm-pathlength cuvette was put in the sample beam and air in the reference beam. Full spectrum PLSR was applied to determine total protein from sera. Next, the wavelength region of 1672∼1754 nm was selected based on the first loading vector analysis. Standard Error of Cross Validation (SECV) of full spectrum (1530∼l850 nm) PLSR and selected wavelength PLSR (1672∼1754 nm) was respectively 0.28 and 0.27 g/㎗. The prediction accuracy between the two bands was equal. Wavelength selection based on loading vector in PLSR seemed to be simple and robust in comparison to other methods based on correlation plot, regression vector and genetic algorithm. As a reference of wavelength selection for PLSR, the loading vector has the advantage over the correlation plot since the former is based on multivariate model whereas the latter, on univariate model. Wavelength selection by the first loading vector analysis requires shorter computation time than that by genetic algorithm and needs not smoothing.

  • PDF

Measurement Error Variance Estimation Based on Complex Survey Data with Subsample Re-Measurements

  • Heo, Sunyeong;Eltinge, John L.
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.553-566
    • /
    • 2003
  • In many cases, the measurement error variances may be functions of the unknown true values or related covariates. This paper considers design-based estimators of the parameters of these variance functions based on the within-unit sample variances. This paper devotes to: (1) define an error scale factor $\delta$; (2) develop estimators of the parameters of the linear measurement error variance function of the true values under large-sample and small-error conditions; (3) use propensity methods to adjust survey weights to account for possible selection effects at the replicate level. The proposed methods are applied to medical examination data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).

Designing the Archival Contents Sample for Education Based on Curriculum-standards Analysis (교육과정 분석에 따른 교육용 기록정보콘텐츠의 예시 설계)

  • Lee, Eun Yeong
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.11 no.2
    • /
    • pp.165-188
    • /
    • 2011
  • This paper suggests the methods of curriculum-standards analysis and collection analysis. This paper also designs the sample of archival contents for education based on this methods. This paper intends to verify the methods and to suggest a model of archival contents for education available in connection with the curriculum-standards by designing the archival contents for education on national unification in high school classrooms.

High-dimensional linear discriminant analysis with moderately clipped LASSO

  • Chang, Jaeho;Moon, Haeseong;Kwon, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.21-37
    • /
    • 2021
  • There is a direct connection between linear discriminant analysis (LDA) and linear regression since the direction vector of the LDA can be obtained by the least square estimation. The connection motivates the penalized LDA when the model is high-dimensional where the number of predictive variables is larger than the sample size. In this paper, we study the penalized LDA for a class of penalties, called the moderately clipped LASSO (MCL), which interpolates between the least absolute shrinkage and selection operator (LASSO) and minimax concave penalty. We prove that the MCL penalized LDA correctly identifies the sparsity of the Bayes direction vector with probability tending to one, which is supported by better finite sample performance than LASSO based on concrete numerical studies.

Comparison of log-logistic and generalized extreme value distributions for predicted return level of earthquake (지진 재현수준 예측에 대한 로그-로지스틱 분포와 일반화 극단값 분포의 비교)

  • Ko, Nak Gyeong;Ha, Il Do;Jang, Dae Heung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.107-114
    • /
    • 2020
  • Extreme value distributions have often been used for the analysis (e.g., prediction of return level) of data which are observed from natural disaster. By the extreme value theory, the block maxima asymptotically follow the generalized extreme value distribution as sample size increases; however, this may not hold in a small sample case. For solving this problem, this paper proposes the use of a log-logistic (LLG) distribution whose validity is evaluated through goodness-of-fit test and model selection. The proposed method is illustrated with data from annual maximum earthquake magnitudes of China. Here, we present the predicted return level and confidence interval according to each return period using LLG distribution.

Exploring the Core Keywords of the Secondary School Home Economics Teacher Selection Test: A Mixed Method of Content and Text Network Analyses (중등학교 가정과교사 임용시험의 핵심 키워드 탐색: 내용 분석과 텍스트 네트워크 분석을 중심으로)

  • Mi Jeong, Park;Ju, Han
    • Human Ecology Research
    • /
    • v.60 no.4
    • /
    • pp.625-643
    • /
    • 2022
  • The purpose of this study was to explore the trends and core keywords of the secondary school home economics teacher selection test using content analysis and text network analysis. The sample comprised texts of the secondary school home economics teacher 1st selection test for the 2017-2022 school years. Determination of frequency of occurrence, generation of word clouds, centrality analysis, and topic modeling were performed using NetMiner 4.4. The key results were as follows. First, content analysis revealed that the number of questions and scores for each subject (field) has remained constant since 2020, unlike before 2020. In terms of subjects, most questions focused on 'theory of home economics education', and among the evaluation content elements, the highest percentage of questions asked was for 'home economics teaching·learning methods and practice'. Second, the network of the secondary school home economics teacher selection test covering the 2017-2022 school years has an extremely weak density. For the 2017-2019 school years, 'learning', 'evaluation', 'instruction', and 'method' appeared as important keywords, and 7 topics were extracted. For the 2020-2022 school years, 'evaluation', 'class', 'learning', 'cycle', and 'model' were influential keywords, and five topics were extracted. This study is meaningful in that it attempted a new research method combining content analysis and text network analysis and prepared basic data for the revision of the evaluation area and evaluation content elements of the secondary school home economics teacher selection test.

Adaptive lasso in sparse vector autoregressive models (Adaptive lasso를 이용한 희박벡터자기회귀모형에서의 변수 선택)

  • Lee, Sl Gi;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.1
    • /
    • pp.27-39
    • /
    • 2016
  • This paper considers variable selection in the sparse vector autoregressive (sVAR) model where sparsity comes from setting small coefficients to exact zeros. In the estimation perspective, Davis et al. (2015) showed that the lasso type of regularization method is successful because it provides a simultaneous variable selection and parameter estimation even for time series data. However, their simulations study reports that the regular lasso overestimates the number of non-zero coefficients, hence its finite sample performance needs improvements. In this article, we show that the adaptive lasso significantly improves the performance where the adaptive lasso finds the sparsity patterns superior to the regular lasso. Some tuning parameter selections in the adaptive lasso are also discussed from the simulations study.

Integrated Manufacturing Systems Design : Integrated Approach to Process Plan Selection and AGV Guidepath Design (통합 제조 시스템 설계 : 공정 계획과 AGV 경로 설계의 통합 접근)

  • Seo, Yoon-Ho
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.20 no.3
    • /
    • pp.151-166
    • /
    • 1994
  • The manufacturing environment on which this research is focused is an FMS in which AGVs are used for material handling and each part type has one or more process plans. The research aims at developing a methodology whereby, given a part and volume mix for production during any production session, the best set of process plans including one plan per part type is selected and the best unidirectional AGV guidepath can be dynamically reconfigured in response to changes in parts and lot sizes combination. For the integrated PPS/FGD problem in which two functions of process plan selection (PPS) and flexible AGV guidepath design (FGD) are integrated, a zero-one integer programming model is developed. The integrated problem is decomposed into two subproblems, process plan selection given a directed AGV layout and AGV guidepath design with a fixed process plan per part type. A heuristic algorithm that alternately and iteratively solves these two subproblems is developed. The effectiveness of the heuristic algorithm is tested by solving various randomly generated sample problems and comparing the heuristic solutions with those obtained by an exact procedure. From the test results, the following conclusions are drawn: 1) For a reasonable size problem, the heuristic is very effective. 2) By integrating the two functions of PPS and FGD, a remarkable benefit in total production time for a given part and volume mix is gained.

  • PDF

The Effects of Selection Attributes on Customers' Satisfaction and Behavioral Intention for Hotel Weddings - Focusing on Young People's Life Style - (호텔 예식 선택 속성의 만족도와 행동의도에 관한 연구 - 라이프스타일을 중심으로 -)

  • Ryoo, Kyung-Min;Park, Jung-Ha
    • Culinary science and hospitality research
    • /
    • v.16 no.2
    • /
    • pp.199-214
    • /
    • 2010
  • This study aims to investigate the effects of selection attributes on customers' satisfaction and behavioral intention for hotel weddings depending on their life style. The developed hypotheses were tested using a sample of customers who have ever had hotel weddings in the age of 20~30 living in Seoul and Daejeon. The total number of 300 self-administrated questionnaire copies were distributed and 248 valid samples were used for the analysis. In order to examine the proposed model, statistical tests were conducted using SPSS (14.0). The results showed that the customers' attributes selecting a hotel wedding were significantly different depending on their life style. It was also found that customers' satisfaction has a significantly positive effect on their behavioral intention for hotel weddings.

  • PDF

Error Estimation Method for Matrix Correlation-Based Wi-Fi Indoor Localization

  • Sun, Yong-Liang;Xu, Yu-Bin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.11
    • /
    • pp.2657-2675
    • /
    • 2013
  • A novel neighbor selection-based fingerprinting algorithm using matrix correlation (MC) for Wi-Fi localization is presented in this paper. Compared with classic fingerprinting algorithms that usually employ a single received signal strength (RSS) sample, the presented algorithm uses multiple on-line RSS samples in the form of a matrix and measures correlations between the on-line RSS matrix and RSS matrices in the radio-map. The algorithm makes efficient use of on-line RSS information and considers RSS variations of reference points (RPs) for localization, so it offers more accurate localization results than classic neighbor selection-based algorithms. Based on the MC algorithm, an error estimation method using artificial neural network is also presented to fuse available information that includes RSS samples and localization results computed by the MC algorithm and model the nonlinear relationship between the available information and localization errors. In the on-line phase, localization errors are estimated and then used to correct the localization results to reduce negative influences caused by a static radio-map and RP distribution. Experimental results demonstrate that the MC algorithm outperforms the other neighbor selection-based algorithms and the error estimation method can reduce the mean of localization errors by nearly half.