• Title/Summary/Keyword: Multi-collinearity

Search Result 26, Processing Time 0.022 seconds

Biased-Recovering Algorithm to Solve a Highly Correlated Data System (상관관계가 강한 독립변수들을 포함한 데이터 시스템 분석을 위한 편차 - 복구 알고리듬)

  • 이미영
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.28 no.3
    • /
    • pp.61-66
    • /
    • 2003
  • In many multiple regression analyses, the “multi-collinearity” problem arises since some independent variables are highly correlated with each other. Practically, the Ridge regression method is often adopted to deal with the problems resulting from multi-collinearity. We propose a better alternative method using iteration to obtain an exact least squares estimator. We prove the solvability of the proposed algorithm mathematically and then compare our method with the traditional one.

An Analysis of the Economic Effects of R&D Investment in the IT Industry (IT산업 연구개발 투자의 경제적 효과 분석)

  • Hong, Jae-Pyo;Choi, Na-Lin;Kim, Pang-Ryong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.9
    • /
    • pp.837-848
    • /
    • 2012
  • This study has conducted the economic effects of R&D investment in the IT industry using multi-regression analysis with three independent variables; capital stock, labor input and R&D stock. In this study, the IT industry has been categorized into three sub-industries; broadcasting communication appliances, information appliances and electronic components industry. Our analysis has found that auto-correlation shows considerable levels whereas figures of t-value and R-square show significant levels among all the IT sub-industries. Meanwhile, the values of R&D stock in the information appliances industry and that of labor input coefficients in the electronic components industry were minus, thus multi-collinearity was suspected. We have solved the problems regarding auto-correlation and multi-collinearity through Cochrane-Orcutt estimation and principal components analysis. This paper has derived the implications that R&D investment in the broadcasting communication industry is much more influential than any other IT sub-industry.

Development of Formative Constructs and Measurements for Performance Evaluation of Information Systems (정보시스템 성과평가를 위한 형성적 구성변수(Constructs) 및 측정지표 개발)

  • Kim, Sanghoon;Kim, Changkyu
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.135-151
    • /
    • 2012
  • Traditionally in IS studies, the relationship between construct and its measurement items tends to be assumed to be reflective, meaning that the measurements are a reflection of the construct. In reality, however, the nature of the construct can be often formative, which means that its measurement items describe and define the construct rather than vice versa. The purpose of this study was to investigate theoretical and empirically-analysed differences between formative construct and reflective construct through comprehensive interdisciplinary literature review. And then on the basis of these differences, we intended to derive the rule of specifying whether the construct is formative or reflective and propose the methodology of testing the validity(content validity, construct validity, internal consistency and external construct) of formative construct and its measurements, differentiated from that in the case of reflective construct. Also, we suggested the concrete statistical testing methods such as VTT(Vanishing Tetrad Test), MIMIC(Multiple Indicators and Multiple Causes) test and multi-collinearity test. In order to examine the applicability of this methodology to developing the constructs for performance evaluation of IS(Information Systems), we tried to identify its attribute(formative or reflective) and test the validity for the construct arbitrarily chosen among them which had been derived in our previous IS performance evaluation study by using this methodology. The result of the examination was that the methodology proposed in this study was significantly valid and effective in the area of IS performance evaluation.

Analysis of the Productivity and Effects of Administration Information System: Focused on KONEPS(Korea Online E-Procurement System) (행정업무시스템의 생산성 및 효과 분석: 나라장터 중심으로)

  • Kim, Hun-Hee;Oh, Changsuk
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.2
    • /
    • pp.123-136
    • /
    • 2017
  • The evaluation and analysis method of information system (IS) is studied from the system perspective, the user perspective, and the management viewpoint. The detailed analysis method performs qualitative evaluation by user questionnaire or expert opinion. In this study, Measures the productivity and the effect of building administrative information systems. In the previous study, qualitative productivity and universal effect indicators were used, but in this study, quantitative productivity indicators and indicators specific to administrative complaints were selected. KONEPS, an administrative service system, used electronic contract records and information recorded in the intermediate process. The information was converted into the number of days, and the productivity based on the input manpower was calculated. The effect analysis analyzed the questionnaire related to civil affairs, which is the goal of the administrative work system. Each factor was divided into reflective structural variable and formal structural variable, and internal consistency and multi-collinearity were diagnosed. In order to verify the model, the influence of the work was set as a hypothesis, the reliability was verified according to the descriptive statistics method, the influence was measured through the regression analysis, and the model was analyzed by the multiple regression model path coefficient. Model validation methods are Chi-square (df, p), RMR, GFI, AGFI, NFI, CFI and GFI as indicators according to CFA.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Registration of Aerial Image with Lines using RANSAC Algorithm

  • Ahn, Y.;Shin, S.;Schenk, T.;Cho, W.
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.25 no.6_1
    • /
    • pp.529-536
    • /
    • 2007
  • Registration between image and object space is a fundamental step in photogrammetry and computer vision. Along with rapid development of sensors - multi/hyper spectral sensor, laser scanning sensor, radar sensor etc., the needs for registration between different sensors are ever increasing. There are two important considerations on different sensor registration. They are sensor invariant feature extraction and correspondence between them. Since point to point correspondence does not exist in image and laser scanning data, it is necessary to have higher entities for extraction and correspondence. This leads to modify first, existing mathematical and geometrical model which was suitable for point measurement to line measurements, second, matching scheme. In this research, linear feature is selected for sensor invariant features and matching entity. Linear features are incorporated into mathematical equation in the form of extended collinearity equation for registration problem known as photo resection which calculates exterior orientation parameters. The other emphasis is on the scheme of finding matched entities in the aide of RANSAC (RANdom SAmple Consensus) in the absence of correspondences. To relieve computational load which is a common problem in sampling theorem, deterministic sampling technique and selecting 4 line features from 4 sectors are applied.

Defect Severity-based Dimension Reduction Model using PCA (PCA를 적용한 결함 심각도 기반 차원 축소 모델)

  • Kwon, Ki Tae;Lee, Na-Young
    • Journal of Software Assessment and Valuation
    • /
    • v.15 no.1
    • /
    • pp.79-86
    • /
    • 2019
  • Software dimension reduction identifies the commonality of elements and extracts important feature elements. So it reduces complexity by simplify and solves multi-collinearity problems. And it reduces redundancy by performing redundancy and noise detection. In this study, we proposed defect severity-based dimension reduction model. Proposed model is applied defect severity-based NASA dataset. And it is verified the number of dimensions in the column that affect the severity of the defect. Then it is compares and analyzes the dimensions of the data before and after reduction. In this study experiment result, the number of dimensions of PC4's dataset is 2 to 3. It was possible to reduce the dimension.

Development of Traffic Accident Rate Forecasting Models for Trumpet IC Exit Ramp of Freeway using Variables Transformation Method (변수변환 기법을 이용한 고속도로 트럼펫IC 유출연결로 교통사고율 예측모형 개발)

  • Yoon, Byoung-Jo
    • International Journal of Highway Engineering
    • /
    • v.10 no.4
    • /
    • pp.139-150
    • /
    • 2008
  • In this study, It is focused on development of the forecasting model about trumpet InterChange(IC) ramp accident because of the frequency of accident in ramp more than highway basic section and trend the increasing accident in ramp. The independent variables was selected through statistical analysis(correlation analysis, multi-collinearity etc) by ramp types(direct, semi-direct and loop). The independent variables and accident rate is non-linear relationship. So it made new variables by transformation of the independent variables. The forecasting models according to exit-ramp type (direct, semi-direct and loop) are built with statistical multi-variable regression using all possible regression method. And the forecasts of the models showed high accuracy statistically. It is expected that the developed models could be employed to design trumpet IC ramp more cost-efficiently and safely and to analyze the causes of traffic accidents happened on the IC ramp.

  • PDF

Prevalence and risk factors of helminth infections in cattle of Bangladesh

  • Rahman, A.K.M.A.;Begum, N.;Nooruddin, M.;Rahman, Md. Siddiqur;Hossain, M.A.;Song, Hee-Jong
    • Korean Journal of Veterinary Service
    • /
    • v.32 no.3
    • /
    • pp.265-273
    • /
    • 2009
  • A cross-sectional survey was undertaken to identify risk factors and clinical signs associated with parasitic helminth infections of cattle in Mymensignh district of Bangladesh. A nonrandom convenience sampling method was used to select 138 animals from 40 farmers/herds. The eggs per gram of faeces (epg) for nematodes and trematodes were determined by McMaster and Stoll's methods respectively. Animal-level and herd-level data were recorded by means of a questionnaire. Multi-collinearity amongst explanatory variables were assessed using $2{\times}2{\times}\;X^2$ test and one variable in a pair was dropped if $P{\leq}0.05$ formultiple logistic regression models. Association study between outcome and explanatory variables was conducted using classification tree, random forests and multiple logistic regression. A positive epg was considered as infected. Analyses were performed using $STATA^{(R)}$, version 8.0/Intercooled and $R^{(R)}$, Version 2.3.0. Seventy eight percent of the cattle were found to be infected with at least one type of helminth. Twenty four pairs of combinations of explanatory variables showed significant associations. Male animals (OR=3.3, P=.006, 95% CI=1.4, 7.7) were associated with significantly increased prevalence of nematode infection. Female cattle of the study area are mostly cross-breed, kept indoor, fed relatively good diet and not used for draught purpose. Males are used for draught purpose thereby more exposed to nematode infective stage and provided with relatively poor diet. So stressed male cattle may become more susceptible to nematode infection. All of the three statistical techniques selected gender and lumen motility as most important variables in association with nematode infection in cattle. The result of this survey can only be extrapolated to the periurban cattle population of traditional management system.

Non-linear regression model considering all association thresholds for decision of association rule numbers (기본적인 연관평가기준 전부를 고려한 비선형 회귀모형에 의한 연관성 규칙 수의 결정)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.267-275
    • /
    • 2013
  • Among data mining techniques, the association rule is the most recently developed technique, and it finds the relevance between two items in a large database. And it is directly applied in the field because it clearly quantifies the relationship between two or more items. When we determine whether an association rule is meaningful, we utilize interestingness measures such as support, confidence, and lift. Interestingness measures are meaningful in that it shows the causes for pruning uninteresting rules statistically or logically. But the criteria of these measures are chosen by experiences, and the number of useful rules is hard to estimate. If too many rules are generated, we cannot effectively extract the useful rules.In this paper, we designed a variety of non-linear regression equations considering all association thresholds between the number of rules and three interestingness measures. And then we diagnosed multi-collinearity and autocorrelation problems, and used analysis of variance results and adjusted coefficients of determination for the best model through numerical experiments.