• 제목/요약/키워드: sample selection model

검색결과 197건 처리시간 0.023초

Wavelength selection by loading vector analysis in determining total protein in human serum using near-infrared spectroscopy and Partial Least Squares Regression

  • Kim, Yoen-Joo;Yoon, Gil-Won
    • 한국근적외분광분석학회:학술대회논문집
    • /
    • 한국근적외분광분석학회 2001년도 NIR-2001
    • /
    • pp.4102-4102
    • /
    • 2001
  • In multivariate analysis, absorbance spectrum is measured over a band of wavelengths. One does not often pay attention to the size of this wavelength band. However, it is desirable that spectrum is measured at only necessary wavelengths as long as the acceptable accuracy of prediction can be met. In this paper, the method of selecting an optimal band of wavelengths based on the loading vector analysis was proposed and applied for determining total protein in human serum using near-infrared transmission spectroscopy and PLSR. Loading vectors in the full spectrum PLSR were used as reference in selecting wavelengths, but only the first loading vector was used since it explains the spectrum best. Absorbance spectra of sera from 97 outpatients were measured at 1530∼1850 nm with an interval of 2 nm. Total protein concentrations of sera were ranged from 5.1 to 7.7 g/㎗. Spectra were measured by Cary 5E spectrophotometer (Varian, Australia). Serum in the 5 mm-pathlength cuvette was put in the sample beam and air in the reference beam. Full spectrum PLSR was applied to determine total protein from sera. Next, the wavelength region of 1672∼1754 nm was selected based on the first loading vector analysis. Standard Error of Cross Validation (SECV) of full spectrum (1530∼l850 nm) PLSR and selected wavelength PLSR (1672∼1754 nm) was respectively 0.28 and 0.27 g/㎗. The prediction accuracy between the two bands was equal. Wavelength selection based on loading vector in PLSR seemed to be simple and robust in comparison to other methods based on correlation plot, regression vector and genetic algorithm. As a reference of wavelength selection for PLSR, the loading vector has the advantage over the correlation plot since the former is based on multivariate model whereas the latter, on univariate model. Wavelength selection by the first loading vector analysis requires shorter computation time than that by genetic algorithm and needs not smoothing.

  • PDF

Measurement Error Variance Estimation Based on Complex Survey Data with Subsample Re-Measurements

  • Heo, Sunyeong;Eltinge, John L.
    • Communications for Statistical Applications and Methods
    • /
    • 제10권2호
    • /
    • pp.553-566
    • /
    • 2003
  • In many cases, the measurement error variances may be functions of the unknown true values or related covariates. This paper considers design-based estimators of the parameters of these variance functions based on the within-unit sample variances. This paper devotes to: (1) define an error scale factor $\delta$; (2) develop estimators of the parameters of the linear measurement error variance function of the true values under large-sample and small-error conditions; (3) use propensity methods to adjust survey weights to account for possible selection effects at the replicate level. The proposed methods are applied to medical examination data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).

교육과정 분석에 따른 교육용 기록정보콘텐츠의 예시 설계 (Designing the Archival Contents Sample for Education Based on Curriculum-standards Analysis)

  • 이은영
    • 한국기록관리학회지
    • /
    • 제11권2호
    • /
    • pp.165-188
    • /
    • 2011
  • 본 논문은 교사와 학생을 대상으로 제공하는 교육용 기록정보콘텐츠를 개발하는데 있어 핵심적인 단계인 교육과정 분석과 컬렉션 분석의 방법을 제시하고 이 방법에 따라 교육용 기록정보콘텐츠의 예시를 설계하였다. 본 논문은 고등학교 수업에서 통일 교육의 교재로 활용할 수 있는 교육용 기록정보콘텐츠를 설계함으로써 앞서 제안한 교육과정 분석과 컬렉션 분석의 적용 가능성을 검증해보고, 교육과정과 연계되어 활용할 수 있는 교육용 기록정보콘텐츠의 모델을 제시하고자 하였다.

High-dimensional linear discriminant analysis with moderately clipped LASSO

  • Chang, Jaeho;Moon, Haeseong;Kwon, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권1호
    • /
    • pp.21-37
    • /
    • 2021
  • There is a direct connection between linear discriminant analysis (LDA) and linear regression since the direction vector of the LDA can be obtained by the least square estimation. The connection motivates the penalized LDA when the model is high-dimensional where the number of predictive variables is larger than the sample size. In this paper, we study the penalized LDA for a class of penalties, called the moderately clipped LASSO (MCL), which interpolates between the least absolute shrinkage and selection operator (LASSO) and minimax concave penalty. We prove that the MCL penalized LDA correctly identifies the sparsity of the Bayes direction vector with probability tending to one, which is supported by better finite sample performance than LASSO based on concrete numerical studies.

지진 재현수준 예측에 대한 로그-로지스틱 분포와 일반화 극단값 분포의 비교 (Comparison of log-logistic and generalized extreme value distributions for predicted return level of earthquake)

  • 고낙경;하일도;장대흥
    • 응용통계연구
    • /
    • 제33권1호
    • /
    • pp.107-114
    • /
    • 2020
  • 자연 재해로부터 관측되는 자료를 대상으로 재현 수준 예측 등과 같은 자료 분석을 위해 일반화 극단값 분포(generalized extreme value)가 자주 사용되어 왔다. 표본 수가 충분히 큰 경우 연속적인 블록 최댓값들은 점근적으로 일반화 극단값 분포를 따른다. 하지만 소표본인 경우 이러한 사실은 성립되지 않을 수도 있다. 본 논문에서는 이러한 문제점을 해결하기 위해 모형 적합도 검정 및 모형 선택을 통해 로그-로지스틱(log-logistic) 분포의 사용을 제안한다. 하나의 예증으로서 중국 지진 자료를 대상으로 하여 로그-로지스틱 분포를 이용하여 재현 기간별 재현 수준 예측 및 신뢰구간을 제시한다.

중등학교 가정과교사 임용시험의 핵심 키워드 탐색: 내용 분석과 텍스트 네트워크 분석을 중심으로 (Exploring the Core Keywords of the Secondary School Home Economics Teacher Selection Test: A Mixed Method of Content and Text Network Analyses)

  • 박미정;한주
    • Human Ecology Research
    • /
    • 제60권4호
    • /
    • pp.625-643
    • /
    • 2022
  • The purpose of this study was to explore the trends and core keywords of the secondary school home economics teacher selection test using content analysis and text network analysis. The sample comprised texts of the secondary school home economics teacher 1st selection test for the 2017-2022 school years. Determination of frequency of occurrence, generation of word clouds, centrality analysis, and topic modeling were performed using NetMiner 4.4. The key results were as follows. First, content analysis revealed that the number of questions and scores for each subject (field) has remained constant since 2020, unlike before 2020. In terms of subjects, most questions focused on 'theory of home economics education', and among the evaluation content elements, the highest percentage of questions asked was for 'home economics teaching·learning methods and practice'. Second, the network of the secondary school home economics teacher selection test covering the 2017-2022 school years has an extremely weak density. For the 2017-2019 school years, 'learning', 'evaluation', 'instruction', and 'method' appeared as important keywords, and 7 topics were extracted. For the 2020-2022 school years, 'evaluation', 'class', 'learning', 'cycle', and 'model' were influential keywords, and five topics were extracted. This study is meaningful in that it attempted a new research method combining content analysis and text network analysis and prepared basic data for the revision of the evaluation area and evaluation content elements of the secondary school home economics teacher selection test.

Adaptive lasso를 이용한 희박벡터자기회귀모형에서의 변수 선택 (Adaptive lasso in sparse vector autoregressive models)

  • 이슬기;백창룡
    • 응용통계연구
    • /
    • 제29권1호
    • /
    • pp.27-39
    • /
    • 2016
  • 본 논문은 다차원의 시계열 자료 분석에서 효율적인 희박벡터자기회귀모형에서의 모수 추정에 대해서 연구한다. 희박벡터자기회귀모형은 영에 가까운 계수를 정확이 영으로 둠으로써 희박성을 확보한다. 따라서 변수 선택과 모수 추정을 한꺼번에 할 수 있는 lasso를 이용한 방법론을 희박벡터자기회귀모형의 추정에 쓸 수 있다. 하지만 Davis 등(2015)에서는 모의실험을 통해 일반적인 lasso의 경우 영이아닌 계수를 참값보다 훨씬 더 많이 찾아 희박성에 약점이 있음을 보고하였다. 이에 따라 본 연구는 희박벡터자기회귀모형에 adaptive lasso를 이용하면 일반 lasso보다 희박성을 비롯한 전반적인 모수의 추정이 매우 유의하게 개선됨을 보인다. 또한 adaptive lasso에서 쓰이는 튜닝 모수들에 대한 선택도 아울러 논의한다.

통합 제조 시스템 설계 : 공정 계획과 AGV 경로 설계의 통합 접근 (Integrated Manufacturing Systems Design : Integrated Approach to Process Plan Selection and AGV Guidepath Design)

  • 서윤호
    • 대한산업공학회지
    • /
    • 제20권3호
    • /
    • pp.151-166
    • /
    • 1994
  • The manufacturing environment on which this research is focused is an FMS in which AGVs are used for material handling and each part type has one or more process plans. The research aims at developing a methodology whereby, given a part and volume mix for production during any production session, the best set of process plans including one plan per part type is selected and the best unidirectional AGV guidepath can be dynamically reconfigured in response to changes in parts and lot sizes combination. For the integrated PPS/FGD problem in which two functions of process plan selection (PPS) and flexible AGV guidepath design (FGD) are integrated, a zero-one integer programming model is developed. The integrated problem is decomposed into two subproblems, process plan selection given a directed AGV layout and AGV guidepath design with a fixed process plan per part type. A heuristic algorithm that alternately and iteratively solves these two subproblems is developed. The effectiveness of the heuristic algorithm is tested by solving various randomly generated sample problems and comparing the heuristic solutions with those obtained by an exact procedure. From the test results, the following conclusions are drawn: 1) For a reasonable size problem, the heuristic is very effective. 2) By integrating the two functions of PPS and FGD, a remarkable benefit in total production time for a given part and volume mix is gained.

  • PDF

호텔 예식 선택 속성의 만족도와 행동의도에 관한 연구 - 라이프스타일을 중심으로 - (The Effects of Selection Attributes on Customers' Satisfaction and Behavioral Intention for Hotel Weddings - Focusing on Young People's Life Style -)

  • 류경민;박정하
    • 한국조리학회지
    • /
    • 제16권2호
    • /
    • pp.199-214
    • /
    • 2010
  • 본 연구는 호텔 예식 선택 속성의 만족도와 행동의도 관한 연구로 서울 및 대전 지역의 특급호텔 예식을 경험한 고객을 대상으로 하였다. 조사를 위해 총 300부의 설문지를 배포하여 이중 248부를 연구의 자료로 활용하였다. 자료는 데이터 코딩을 거친 후 SPSS 14.0 통계 패키지를 이용하여 분석하였다. 라이프스타일에 대한 요인 분석 결과 4개의 요인이 추출되었고, 호텔 예식 선택 속성에 관한 요인 분석 결과 5개의 요인이 추출되었다. 연구결과, 호텔 예식 라이프스타일에 따라 호텔선택 속성에 차이가 나타나고 있으며, 호텔 예식 만족도는 호텔 예식 행동의도에 영향을 미치는 것으로 나타났다.

  • PDF

Error Estimation Method for Matrix Correlation-Based Wi-Fi Indoor Localization

  • Sun, Yong-Liang;Xu, Yu-Bin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권11호
    • /
    • pp.2657-2675
    • /
    • 2013
  • A novel neighbor selection-based fingerprinting algorithm using matrix correlation (MC) for Wi-Fi localization is presented in this paper. Compared with classic fingerprinting algorithms that usually employ a single received signal strength (RSS) sample, the presented algorithm uses multiple on-line RSS samples in the form of a matrix and measures correlations between the on-line RSS matrix and RSS matrices in the radio-map. The algorithm makes efficient use of on-line RSS information and considers RSS variations of reference points (RPs) for localization, so it offers more accurate localization results than classic neighbor selection-based algorithms. Based on the MC algorithm, an error estimation method using artificial neural network is also presented to fuse available information that includes RSS samples and localization results computed by the MC algorithm and model the nonlinear relationship between the available information and localization errors. In the on-line phase, localization errors are estimated and then used to correct the localization results to reduce negative influences caused by a static radio-map and RP distribution. Experimental results demonstrate that the MC algorithm outperforms the other neighbor selection-based algorithms and the error estimation method can reduce the mean of localization errors by nearly half.