• Title/Summary/Keyword: principal component regression

Search Result 251, Processing Time 0.026 seconds

Establishment of Strategy for Management of Technology Using Data Mining Technique (데이터 마이닝을 통한 기술경영 전략 수립에 관한 연구)

  • Lee, Junseok;Lee, Joonhyuck;Kim, Gabjo;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.126-132
    • /
    • 2015
  • Technology forecasting is about understanding a status of a specific technology in the future, based on the current data of the technology. It is useful when planning technology management strategies. These days, it is common for countries, companies, and researchers to establish R&D directions and strategies by utilizing experts' opinions. However, this qualitative method of technology forecasting is costly and time consuming since it requires to collect a variety of opinions and analysis from many experts. In order to deal with these limitations, quantitative method of technology forecasting is being studied to secure objective forecast result and help R&D decision making process. This paper suggests a methodology of technology forecasting based on quantitative analysis. The methodology consists of data collection, principal component analysis, and technology forecasting by logistic regression, which is one of the data mining techniques. In this research, patent documents related to autonomous vehicle are collected. Then, the texts from patent documents are extracted by text mining technique to construct an appropriate form for analysis. After principal component analysis, logistic regression is performed by using principal component score. On the basis of this result, it is possible to analyze R&D development situation and technology forecasting.

Comparison of Principal Component Regression and Nonparametric Multivariate Trend Test for Multivariate Linkage (다변량 형질의 유전연관성에 대한 주성분을 이용한 회귀방법와 다변량 비모수 추세검정법의 비교)

  • Kim, Su-Young;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.19-33
    • /
    • 2008
  • Linear regression method, proposed by Haseman and Elston(1972), for detecting linkage to a quantitative trait of sib pairs is a linkage testing method for a single locus and a single trait. However, multivariate methods for detecting linkage are needed, when information from each of several traits that are affected by the same major gene are available on each individual. Amos et al. (1990) extended the regression method of Haseman and Elston(1972) to incorporate observations of two or more traits by estimating the principal component linear function that results in the strongest correlation between the squared pair differences in the trait measurements and identity by descent at a marker locus. But, it is impossible to control the probability of type I errors with this method at present, since the exact distribution of the statistic that they use is yet unknown. In this paper, we propose a multivariate nonparametric trend test for detecting linkage to multiple traits. We compared with a simulation study the efficiencies of multivariate nonparametric trend test with those of the method developed by Amos et al. (1990) for quantitative traits data. For multivariate nonparametric trend test, the results of the simulation study reveal that the Type I error rates are close to the predetermined significance levels, and have in general high powers.

Repetitive model refinement for structural health monitoring using efficient Akaike information criterion

  • Lin, Jeng-Wen
    • Smart Structures and Systems
    • /
    • v.15 no.5
    • /
    • pp.1329-1344
    • /
    • 2015
  • The stiffness of a structure is one of several structural signals that are useful indicators of the amount of damage that has been done to the structure. To accurately estimate the stiffness, an equation of motion containing a stiffness parameter must first be established by expansion as a linear series model, a Taylor series model, or a power series model. The model is then used in multivariate autoregressive modeling to estimate the structural stiffness and compare it to the theoretical value. Stiffness assessment for modeling purposes typically involves the use of one of three statistical model refinement approaches, one of which is the efficient Akaike information criterion (AIC) proposed in this paper. If a newly added component of the model results in a decrease in the AIC value, compared to the value obtained with the previously added component(s), it is statistically justifiable to retain this new component; otherwise, it should be removed. This model refinement process is repeated until all of the components of the model are shown to be statistically justifiable. In this study, this model refinement approach was compared with the two other commonly used refinement approaches: principal component analysis (PCA) and principal component regression (PCR) combined with the AIC. The results indicate that the proposed AIC approach produces more accurate structural stiffness estimates than the other two approaches.

Suggestion of starting pitcher ability index in Korea baseball - Focusing on the sabermetrics statistics WAR (한국프로야구에서 선발투수의 투수능력지수 제안 - 대체선수대비승수 (WAR)을 중심으로)

  • Kim, Hyeon-Gyu;Lee, Jea-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.863-874
    • /
    • 2017
  • Wins above replacement (WAR) is the most commonly used statistics of the many sabermetrics that measure baseball players' abilities. The advantage of a WAR is that it enables to compare performances of players even though they have different roles such as pitcher and hitter. However, WAR is difficult to obtain with common records. Thus, in this paper, we have calculated the sabermetrics variable based on Korean professional baseball records for the past three years (2014-2016). Using these variables, we suggest starting pitcher ability index that can replace WAR. Starting pitcher ability index was calculated by means of arithmetic mean, weighted average and principal component regression. Then, compared to the WAR, the most relevant method was selected, which would be useful to identify for the starting pitcher ability.

A comparison study of inverse censoring probability weighting in censored regression (중도절단 회귀모형에서 역절단확률가중 방법 간의 비교연구)

  • Shin, Jungmin;Kim, Hyungwoo;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.957-968
    • /
    • 2021
  • Inverse censoring probability weighting (ICPW) is a popular technique in survival data analysis. In applications of the ICPW technique such as the censored regression, it is crucial to accurately estimate the censoring probability. A simulation study is undertaken in this article to see how censoring probability estimate influences model performance in censored regression using the ICPW scheme. We compare three censoring probability estimators, including Kaplan-Meier (KM) estimator, Cox proportional hazard model estimator, and local KM estimator. For the local KM estimator, we propose to reduce the predictor dimension to avoid the curse of dimensionality and consider two popular dimension reduction tools: principal component analysis and sliced inverse regression. Finally, we found that the Cox proportional hazard model estimator shows the best performance as a censoring probability estimator in both mean and median censored regressions.

A Study on the Prediction of Fuel Consumption of a Ship Using the Principal Component Analysis (주성분 분석기법을 이용한 선박의 연료소비 예측에 관한 연구)

  • Kim, Young-Rong;Kim, Gujong;Park, Jun-Bum
    • Journal of Navigation and Port Research
    • /
    • v.43 no.6
    • /
    • pp.335-343
    • /
    • 2019
  • As the regulations of ship exhaust gas have been strengthened recently, many measures are under consideration to reduce fuel consumption. Among them, research has been performed actively to develop a machine-learning model that predicts fuel consumption by using data collected from ships. However, many studies have not considered the methodology of the main parameter selection for the model or the processing of the collected data sufficiently, and the reckless use of data may cause problems such as multicollinearity between variables. In this study, we propose a method to predict the fuel consumption of the ship by using the principal component analysis to solve these problems. The principal component analysis was performed on the operational data of the 13K TEU container ship and the fuel consumption prediction model was implemented by regression analysis with extracted components. As the R-squared value of the model for the test data was 82.99%, this model would be expected to support the decision-making of operators in the voyage planning and contribute to the monitoring of energy-efficient operation of ships during voyages.

Non-linear PLS based on non-linear principal component analysis and neural network (비선형 주성분해석과 신경망에 기반한 비선형 PLS)

  • 손정현;정신호;송상옥;윤인섭
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.394-394
    • /
    • 2000
  • This Paper proposes a new nonlinear partial least square method that extends the linear PLS. Proposed nonlinear PLS uses self-organizing feature map as PLS outer relation and multilayer neural network as PLS inner regression method.

  • PDF

Source Characterization of Suspended Particulate Matter in Taegu Area, Using Principal Component Analysis Coupled with Multiple Regression (주성분/중회귀분석을 이용한 대구지역 대기중 부유분진의 발생원별 특성평가)

  • 백성옥;황승만
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.8 no.3
    • /
    • pp.179-190
    • /
    • 1992
  • This study was carried out to characterize sources of atmospheric total suspended particulates (TSP) in urban and sub--urban areas of metropolitan taegu. The sources were tentatively identified by a multivariate technique, i.e. principal component analysis (PCA), and the source contributions to the atmospheric concentrations of TSP were further estimated by stepwise multiple regression analysis. A total of 5 sources was identified in the urban area of Taegu (soil dust resuspension, fuel combustion, secondary aerosol, traffic related aerosol, and refuge burning), while 4 sources were found to be significant in the sub--urban area as following: fuel combustion/secondary aerosol, soil dust resuspension, traffic related aerosol, and wood/agricultural burning. The largest contributor to the atmospheric TSP appeared to be the soil dust resuspension in both areas. The source apportionment of the extractable organic matter (EOM) was also carried out for the Taegu data. The EOM was determined with respect to the solvent polarity, i.e. cyclohexane (non-polar), dichloromethane (semi--polar), and acetone (polar). In addition, the source profiles for the TSP in Taegu area were estimated using a PCA-based algorithm, and the validity was evaluated tentatively by comparing the data in the literature.

  • PDF

Feature Extraction via Sparse Difference Embedding (SDE)

  • Wan, Minghua;Lai, Zhihui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.7
    • /
    • pp.3594-3607
    • /
    • 2017
  • The traditional feature extraction methods such as principal component analysis (PCA) cannot obtain the local structure of the samples, and locally linear embedding (LLE) cannot obtain the global structure of the samples. However, a common drawback of existing PCA and LLE algorithm is that they cannot deal well with the sparse problem of the samples. Therefore, by integrating the globality of PCA and the locality of LLE with a sparse constraint, we developed an improved and unsupervised difference algorithm called Sparse Difference Embedding (SDE), for dimensionality reduction of high-dimensional data in small sample size problems. Significantly differing from the existing PCA and LLE algorithms, SDE seeks to find a set of perfect projections that can not only impact the locality of intraclass and maximize the globality of interclass, but can also simultaneously use the Lasso regression to obtain a sparse transformation matrix. This characteristic makes SDE more intuitive and more powerful than PCA and LLE. At last, the proposed algorithm was estimated through experiments using the Yale and AR face image databases and the USPS handwriting digital databases. The experimental results show that SDE outperforms PCA LLE and UDP attributed to its sparse discriminating characteristics, which also indicates that the SDE is an effective method for face recognition.

Effects of Attitudes Toward Reasons for which Abortion is Permitted on Needs for Abortion Prevention Policies among Female Students (낙태허용 사유에 대한 여학생의 인식이 낙태예방정책 요구도에 미치는 영향)

  • Yoo, Gye-Sook
    • Journal of Families and Better Life
    • /
    • v.30 no.3
    • /
    • pp.1-11
    • /
    • 2012
  • The purpose of this study is to analyze the effects of attitudes toward reasons for which abortion is permitted on needs for abortion prevention policies among 232 unmarried female students at the middle schools, high schools, and universities located in Seoul. The respondents were requested to complete the self-administered questionnaire, and the principal component analysis, t-tests, Pearson's correlations, and hierarchical multiple regression analyses were performed for analyzing data. The major findings of this study were as follows: First, the principal component analysis identified three reasons for which abortion is permitted. These are reasons under the maternal & child health law, socioeconomic reasons, and normatively unqualified reasons. Second, the female students showed permissive attitudes toward reasons for abortion under the maternal & child health law, disapproval attitudes toward socioeconomic reasons for abortion, and neutral attitudes toward abortion by normatively unqualified reasons. Students also showed high levels of needs for abortion prevention policies. Finally, hierarchical regression analyses revealed that female students' attitudes toward reasons for which abortion is permitted significantly predicted levels of needs for abortion prevention policies, after controlling their sciodemographic characteristics. The implications of the study results are discussed.