• 제목/요약/키워드: Multivariate statistical technique

검색결과 77건 처리시간 0.021초

수정된 반복 주성분 분석 기법에 대한 연구 (Modified Recursive PC)

  • 김동규;김아현;김현중
    • 응용통계연구
    • /
    • 제24권5호
    • /
    • pp.963-977
    • /
    • 2011
  • 다변량 자료를 분석함에 있어 자료의 차원을 축소하는데 활용되는 중요한 툴 중 하나인 PCA 분석(주성분 분석, Principal Component Analysis)을 실시간으로 처리해야 하는 적용 분야가 최근 늘고 있다. PCA 분석에서는 표본 공분산 행렬의 고유값과 고유벡터를 도출하는 것이 관건인데, 자료의 양이 방대하며 고차원인 경우 이를 실시간으로 수행하기에는 어려움이 따른다. 이러한 문제점을 해결하기 위해서 Erdogmus 등 (2004)는 일차 섭동 이론(first order perturbation theory)을 활용하여 공분산 행렬의 고유값과 고유벡터를 추정하는 Recursive PCA 방법을 제안했다. 이 방법은 추가된 자료의 양이 많지 않은 경우는 상당히 정확하지만, 추가된 자료의 양이 많아짐에 따라 오차도 커진다는 한계를 가지고 있다. 본 논문은 공분산 행렬의 고유값과 고유벡터가 가지고 있는 수학적 관계를 이용하여 Erdogmus 등 (2004)가 제안한 Recursive PCA 방법을 수정한 Modi ed Recursive PCA 방법을 제안하다. 또한, 모의 실험을 통해 Recursive PCA 방법과 Modi ed Recursive PCA 방법에서의 고유값과 고유벡터 추정값의 정확도를 비교해 보았으며 그 결과 기존 Recursive PCA 방법 보다 정확한 추정이 가능함을 확인할 수 있었다.

독립성분의 순서화 방법 비교 (Comparison of several criteria for ordering independent components)

  • 최은빈;조수림;박미라
    • 응용통계연구
    • /
    • 제30권6호
    • /
    • pp.889-899
    • /
    • 2017
  • 독립성분분석은 혼합된 신호에서 원신호들을 분리하기 위해서 사용되는 다변량 분석방법으로서, 블라인드 음원 분리 중 가장 널리 사용되는 방법이다. 독립성분분석은 주성분분석이나 요인분석과 같이 선형변환을 사용하지만, 원신호들의 통계적 독립과 비정규성 가정을 필요로 한다는 점에서 다르다. 설명되는 분산의 누적비율이 클수록 더 중요한 성분을 의미하게 되는 주성분분석과 달리, 독립성분분석에서는 독립성분들의 중요순서를 결정하는데 적절한 유일한 기준이 정해지지 않는다. 군집분석이나 차원축소된 그래프 작성 등과 같은 후속 연구를 진행하기 위해서는 일부의 주요 독립성분을 사용하게 되므로, 성분의 순서를 정하는 것은 의미가 있다. 본 연구에서는 성분의 순서를 결정하기 위한 몇 가지 기준의 성능을 비교하였다. 첨도와 첨도의 절댓값, 음의 엔트로피, 콜모고로프-스미르노프 통계량, 계수제곱합을 이용한 방법이 고려되었다. 이들은 알려진 그룹을 분류하는 능력을 기준으로 평가되었다. 두 가지 형태의 자료를 이용한 분석결과를 제시하였다.

무시할 수 없는 무응답에서 편향 보정을 이용한 무응답 대체 (Bias corrected imputation method for non-ignorable non-response)

  • 이민하;신기일
    • 응용통계연구
    • /
    • 제35권4호
    • /
    • pp.485-499
    • /
    • 2022
  • 표본오차와 비표본오차를 포함하는 총오차(total survey error)를 관리하는 것은 표본설계에서 매우 중요하다. 무응답으로 인해 발생한 비표본오차는 총오차에서 차지하는 비중이 매우 크며 이를 해결하는 방법인 무응답 대체에 관한 다수의 연구가 수행되었다. 최근 전통적 통계학 관련 기법에 추가하여 기계학습 관련 기법을 이용한 무응답 대체법이 다수 연구되고 실질적으로 사용되고 있다. 기존에 발표된 다수의 방법은 MCAR(missing completely at random) 또는 MAR(missing at random) 가정을 사용하고 있다. 그러나 관심변수에 영향을 받는 MNAR(missing not at random) 또는 무시할 수 없는 무응답(non-ignorable non-response; NN)은 편향을 발생시켜 대체 결과의 정확성을 크게 떨어뜨리지만 이에 관한 연구는 상대적으로 미미하다. 본 연구에서는 무시할 수 없는 무응답이 발생한 경우에 적용 가능한 무응답 대체법을 제안하였다. 특히 편향을 추정한 후 이를 제거하는 방법을 이용하여 무응답 대체 결과의 정확성을 향상하는 방법을 제안하였다. 또한, 모의실험을 이용하여 제안된 방법의 타당성을 확인하였다.

이미지 검색을 위한 색상 성분 분석 (Color Component Analysis For Image Retrieval)

  • 최영관;최철;박장춘
    • 정보처리학회논문지B
    • /
    • 제11B권4호
    • /
    • pp.403-410
    • /
    • 2004
  • 최근 의료 영상 분석(Medical Image Analysis)이나 영상 검색(Image Retrieval)을 위한 전처리(Preprocessing) 단계로 영상 분석(Image Analysis)에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 영상 검색에서 색상 성분(Color Component)의 활용 방법을 제안하고자 한다. 이미지를 검색하기 위해 색상 성분을 기반으로 하고, 색상(Color)을 분석하기 위한 기법으로 CLCM(Color Level Co-occurrence Matrix)과 통계적 기법을 이용하고 있다. CLCM은 기하학적 회전 변환(Geometric Rotate Transform)을 통해서 색상 성분을 3차원 공간상에 투영(Projection)하여 공간 관계(Spatial Relationship)로부터 나타나는 분포를 해석하는 방법으로, 본 논문에서 제안하는 주제이다. CLCM은 색상 모델에서 만들어지는 2차원 히스토그램을 지칭하며 색상 모델의 기하학적인 회전 변환을 통해서 생성된다. 그리고 이를 분석하기 위한 방법으로 통계 기법을 활용하고 있다. CLCM과 유사하게 2차원 분포도를 사용하는 GLCM(Gray Level Co-occurrence Matrix)[1]과 불변 모멘트(Invariant Moment)[2,3] 같은 알고리즘은 2차원적인 데이터를 해석하기 위하여 기본적인 통계 기법을 활용하고 있다. 하지만 GLCM과 불변 모멘트가 각각의 도메인에 최적화되어 있다 하더라도 공간 좌표상에 존재하는 불규칙적인 데이터를 완전히 해석할 수는 없다. 즉 GLCM과 불변 모멘트는 기초 통계 기법만을 사용하고 있기 때문에 추출된 특징들의 신뢰성이 낮다는 것이다. 본 논문에서는 이러한 단점을 보완하여 공간 관계를 해석함과 동시에 데이터의 가중치를 해석하기 위해 전형적인 다변량 통계에서 사용하는 주성분 분석(Principal Component Analysis)[4,5]을 이용하고 있다. 그리고 데이터의 정확도를 높이기 위해서 3차원 공간상에 색상 성분을 투영하여 이를 회전시키면서 데이터의 특성을 다각도에서 추출하는 방법을 제시한다.

Factor Analysis of Genetic Evaluations For Type Traits of Canadian Holstein Sires and Cows

  • Ali, A.K.;Koots, K.R.;Burnside, E.B.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제11권5호
    • /
    • pp.463-469
    • /
    • 1998
  • Factor analysis was applied as a multivariate statistical technique to official genetic evaluations of type classification traits for 1,265,785 Holstein cows and 10,321 sires computed from data collected between August 1982 and June 1994 in Canada. Type traits included eighteen linear descriptive traits and eight major score card traits. Principal components of the factor analysis showed that only five factors explain the information of the genetic value of linear descriptive traits for both cows and sires. Factor 1 included traits related to mammary system, like texture, median suspensory, fore attachment, fore teat placement and rear attachment height and width. Factor 2 described stature, size, chest width and pin width. These two factors had a similar pattern for both cows and sires. In constrast, Factor 3 for cows involved only bone-quality, while in addition for sires, Factor 3 included foot angle, rear legs desirability and legs set. Factor 4 for cows related to foot angle, set of rear leg and leg desirability, while Factor 4 related to loin strenth and pin setting for sires. Finally, Factor 5 included loin strength and pin setting for cows and described only pin setting for sires. Two factors only were required to describe score card traits of cows and sires. Factor 1 related to final score, feet and legs, udder traits, mammary system and dairy character, while frame/capacity and rump were described by Factor 2. Communality estimates which determine the proportion of variance of a type trait that is shared with other type traits via the common factor variant were high, the highest ${\geq}$ 80% for final score, stature, size and chest width. Pin width and pin desirability had the lowest communality, 56% and 37%. Results indicated shifts in emphasis over the twelve-year period away from udder traits and dairy character, and towards size, scale and width traits. A new system that computes fmal score from type components has been initiated.

계층적 벌점함수를 이용한 주성분분석 (Hierarchically penalized sparse principal component analysis)

  • 강종경;박재신;방성완
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.135-145
    • /
    • 2017
  • 주성분 분석(principal component analysis; PCA)은 서로 상관되어 있는 다변량 자료의 차원을 축소하는 대표적인 기법으로 많은 다변량 분석에서 활용되고 있다. 하지만 주성분은 모든 변수들의 선형결합으로 이루어지므로, 그 결과의 해석이 어렵다는 한계가 있다. sparse PCA(SPCA) 방법은 elastic net 형태의 벌점함수를 이용하여 보다 성긴(sparse) 적재를 가진 수정된 주성분을 만들어주지만, 변수들의 그룹구조를 이용하지 못한다는 한계가 있다. 이에 본 연구에서는 기존 SPCA를 개선하여, 자료가 그룹화되어 있는 경우에 유의한 그룹을 선택함과 동시에 그룹 내 불필요한 변수를 제거할 수 있는 새로운 주성분 분석 방법을 제시하고자 한다. 그룹과 그룹 내 변수 구조를 모형 적합에 이용하기 위하여, sparse 주성분 분석에서의 elastic net 벌점함수 대신에 계층적 벌점함수 형태를 고려하였다. 또한 실제 자료의 분석을 통해 제안 방법의 성능 및 유용성을 입증하였다.

Can a spontaneous smile invalidate facial identification by photo-anthropometry?

  • Pinto, Paulo Henrique Viana;Rodrigues, Caio Henrique Pinke;Rozatto, Juliana Rodrigues;da Silva, Ana Maria Bettoni Rodrigues;Bruni, Aline Thais;da Silva, Marco Antonio Moreira Rodrigues;da Silva, Ricardo Henrique Alves
    • Imaging Science in Dentistry
    • /
    • 제51권3호
    • /
    • pp.279-290
    • /
    • 2021
  • Purpose: Using images in the facial image comparison process poses a challenge for forensic experts due to limitations such as the presence of facial expressions. The aims of this study were to analyze how morphometric changes in the face during a spontaneous smile influence the facial image comparison process and to evaluate the reproducibility of measurements obtained by digital stereophotogrammetry in these situations. Materials and Methods: Three examiners used digital stereophotogrammetry to obtain 3-dimensional images of the faces of 10 female participants(aged between 23 and 45 years). Photographs of the participants' faces were captured with their faces at rest (group 1) and with a spontaneous smile (group 2), resulting in a total of 60 3-dimensional images. The digital stereophotogrammetry device obtained the images with a 3.5-ms capture time, which prevented undesirable movements of the participants. Linear measurements between facial landmarks were made, in units of millimeters, and the data were subjected to multivariate and univariate statistical analyses using Pirouette® version 4.5 (InfoMetrix Inc., Woodinville, WA, USA) and Microsoft Excel® (Microsoft Corp., Redmond, WA, USA), respectively. Results: The measurements that most strongly influenced the separation of the groups were related to the labial/buccal region. In general, the data showed low standard deviations, which differed by less than 10% from the measured mean values, demonstrating that the digital stereophotogrammetry technique was reproducible. Conclusion: The impact of spontaneous smiles on the facial image comparison process should be considered, and digital stereophotogrammetry provided good reproducibility.

Burden of Neck Pain and Associated Factors Among Sewing Machine Operators of Garment Factories in Mekelle City, Northern Part of Ethiopia, 2018, A Cross-Sectional Study

  • Biadgo, Gebremedhin H.;Tsegay, Gebrerufael S.;Mohammednur, Sumeya A.;Gebremeskel, Berihu F.
    • Safety and Health at Work
    • /
    • 제12권1호
    • /
    • pp.51-56
    • /
    • 2021
  • Background: Neck pain is a major public health problem among sewing machine operators working in textile factories. Even though the textile industries are growing in number in Ethiopia, but there is a dearth of published studies on the prevalence of neck pain. Therefore, this study was aimed to assess the prevalence and associated factors of neck pain among sewing machine operators of garment factories in Mekelle city. Method: An institutional-based cross-sectional study design was implemented among 297 sewing machine operators' working in garment factories in Mekelle city. A systematic random sampling technique was used. Data were collected through interviews and analyzed using Statistical Package for Social Science version 23. Finally, variables with 95% confidence interval (CI): p < 0.05 in the multivariate analysis were significantly declared. Results: Two hundred ninety-seven sewing machine operators were enrolled, with 98.7% response rates. In this study, the 12-month prevalence rate of neck pain was found to be 42.3% (95% CI: 36.6%-47.9%), and variables like such as break time [adjusted odds ratio (AOR): 5.888, 95% CI: (2.775-12.493)], working hours per day [AOR: 6.495, 95% CI: (2.216-19.038)], static posture [AOR: 4.487, 95% CI (1.640-12.275)], and repetitive activity [AOR: 4.519, 95% CI:(2.057-9.924)] were associated with neck pain. Conclusion: In this study, neck pain is a major public health problem. Continuous work without break time, working greater than 8 hours per day, sitting in the same position for greater than 2 hours, and high repetitive activities were found significantly associated with neck pain. Owners and governmental bodies need to focus on developing preventive strategies and safety guidelines.

The Role of Customer Values in Increasing Tourist Satisfaction in Gianyar Regency, Bali, Indonesia

  • CEMPENA, Ida Bagus;BRAHMAYANTI, Ida Ayu Sri;ASTAWINETU, Erwin Dyah;PANJAITAN, Feliks Anggia B.K.;KARTINI, Ida Ayu Nuh;PANJAITAN, Hotman
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제8권8호
    • /
    • pp.553-563
    • /
    • 2021
  • Customer value has long been believed to be a direct trigger for increased tourist satisfaction, but as a mediating variable, it still needs to be proven further. This paper aims to examine the causal relationship between research variables, as well as to examine the role of customer value as a mediating variable in the relationship between service quality, brand quality, tourism products, customer value, and tourist satisfaction with tourists' objects. The population is tourists who visit tourist sites/destinations in the Gianyar Regency on the island of Bali, Indonesia, and the sample size is 270 respondents, selected through random sampling. Structural equation modeling (SEM), a multivariate statistical analysis technique, is used to analyze the causal relationships between variables. The results show that the model is accepted, and customer value is proven to be a positive mediating variable. The results also show that service quality, brand quality, and tourism products have an effect on customer value. This provides insight into the practical implications for tourism managers to increase the brand quality of tourist attractions as well as increase the professionalism and quality of tour guide services. This, in turn, will increase customer value and increase tourist satisfaction.

국내 MIS 연구에서 구조방정식모형 활용에 관한 메타분석 (A Meta Analysis of Using Structural Equation Model on the Korean MIS Research)

  • 김종기;전진환
    • Asia pacific journal of information systems
    • /
    • 제19권4호
    • /
    • pp.47-75
    • /
    • 2009
  • Recently, researches on Management Information Systems (MIS) have laid out theoretical foundation and academic paradigms by introducing diverse theories, themes, and methodologies. Especially, academic paradigms of MIS encourage a user-friendly approach by developing the technologies from the users' perspectives, which reflects the existence of strong causal relationships between information systems and user's behavior. As in other areas in social science the use of structural equation modeling (SEM) has rapidly increased in recent years especially in the MIS area. The SEM technique is important because it provides powerful ways to address key IS research problems. It also has a unique ability to simultaneously examine a series of casual relationships while analyzing multiple independent and dependent variables all at the same time. In spite of providing many benefits to the MIS researchers, there are some potential pitfalls with the analytical technique. The research objective of this study is to provide some guidelines for an appropriate use of SEM based on the assessment of current practice of using SEM in the MIS research. This study focuses on several statistical issues related to the use of SEM in the MIS research. Selected articles are assessed in three parts through the meta analysis. The first part is related to the initial specification of theoretical model of interest. The second is about data screening prior to model estimation and testing. And the last part concerns estimation and testing of theoretical models based on empirical data. This study reviewed the use of SEM in 164 empirical research articles published in four major MIS journals in Korea (APJIS, ISR, JIS and JITAM) from 1991 to 2007. APJIS, ISR, JIS and JITAM accounted for 73, 17, 58, and 16 of the total number of applications, respectively. The number of published applications has been increased over time. LISREL was the most frequently used SEM software among MIS researchers (97 studies (59.15%)), followed by AMOS (45 studies (27.44%)). In the first part, regarding issues related to the initial specification of theoretical model of interest, all of the studies have used cross-sectional data. The studies that use cross-sectional data may be able to better explain their structural model as a set of relationships. Most of SEM studies, meanwhile, have employed. confirmatory-type analysis (146 articles (89%)). For the model specification issue about model formulation, 159 (96.9%) of the studies were the full structural equation model. For only 5 researches, SEM was used for the measurement model with a set of observed variables. The average sample size for all models was 365.41, with some models retaining a sample as small as 50 and as large as 500. The second part of the issue is related to data screening prior to model estimation and testing. Data screening is important for researchers particularly in defining how they deal with missing values. Overall, discussion of data screening was reported in 118 (71.95%) of the studies while there was no study discussing evidence of multivariate normality for the models. On the third part, issues related to the estimation and testing of theoretical models on empirical data, assessing model fit is one of most important issues because it provides adequate statistical power for research models. There were multiple fit indices used in the SEM applications. The test was reported in the most of studies (146 (89%)), whereas normed-test was reported less frequently (65 studies (39.64%)). It is important that normed- of 3 or lower is required for adequate model fit. The most popular model fit indices were GFI (109 (66.46%)), AGFI (84 (51.22%)), NFI (44 (47.56%)), RMR (42 (25.61%)), CFI (59 (35.98%)), RMSEA (62 (37.80)), and NNFI (48 (29.27%)). Regarding the test of construct validity, convergent validity has been examined in 109 studies (66.46%) and discriminant validity in 98 (59.76%). 81 studies (49.39%) have reported the average variance extracted (AVE). However, there was little discussion of direct (47 (28.66%)), indirect, and total effect in the SEM models. Based on these findings, we suggest general guidelines for the use of SEM and propose some recommendations on concerning issues of latent variables models, raw data, sample size, data screening, reporting parameter estimated, model fit statistics, multivariate normality, confirmatory factor analysis, reliabilities and the decomposition of effects.