• Title/Summary/Keyword: 주성분회귀분석

Search Result 152, Processing Time 0.035 seconds

Suggestion of starting pitcher ability index in Korea baseball - Focusing on the sabermetrics statistics WAR (한국프로야구에서 선발투수의 투수능력지수 제안 - 대체선수대비승수 (WAR)을 중심으로)

  • Kim, Hyeon-Gyu;Lee, Jea-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.863-874
    • /
    • 2017
  • Wins above replacement (WAR) is the most commonly used statistics of the many sabermetrics that measure baseball players' abilities. The advantage of a WAR is that it enables to compare performances of players even though they have different roles such as pitcher and hitter. However, WAR is difficult to obtain with common records. Thus, in this paper, we have calculated the sabermetrics variable based on Korean professional baseball records for the past three years (2014-2016). Using these variables, we suggest starting pitcher ability index that can replace WAR. Starting pitcher ability index was calculated by means of arithmetic mean, weighted average and principal component regression. Then, compared to the WAR, the most relevant method was selected, which would be useful to identify for the starting pitcher ability.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

Design of Regression Model and Pattern Classifier by Using Principal Component Analysis (주성분 분석법을 이용한 회귀다항식 기반 모델 및 패턴 분류기 설계)

  • Roh, Seok-Beom;Lee, Dong-Yoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.6
    • /
    • pp.594-600
    • /
    • 2017
  • The new design methodology of prediction model and pattern classification, which is based on the dimension reduction algorithm called principal component analysis, is introduced in this paper. Principal component analysis is one of dimension reduction techniques which are used to reduce the dimension of the input space and extract some good features from the original input variables. The extracted input variables are applied to the prediction model and pattern classifier as the input variables. The introduced prediction model and pattern classifier are based on the very simple regression which is the key point of the paper. The structural simplicity of the prediction model and pattern classifier leads to reducing the over-fitting problem. In order to validate the proposed prediction model and pattern classifier, several machine learning data sets are used.

The Study of Korean Manufacturing Industry Wage : Principal Components Regression Analysis (한국 제조업의 임금결정에 대한 연구 : 외환위기 전·후를 중심으로)

  • Oh, Yu-Jin;Park, Sung-Joon;Kim, Yu-Seop
    • Journal of Labour Economics
    • /
    • v.28 no.1
    • /
    • pp.61-82
    • /
    • 2005
  • We investigate wage differentials in Korea in the manufacturing industry, as well as factors affecting structural change in wage determination for the pre- and post-financial crisis regimes. We use the 1995 and 1999 data from the Survey Report on the Wage Structure (SRWS) from the Ministry of Labor. Principal components regression analysis is used to tackle multicollinearity. We employ factor analysis to reduce a set of variables to a smaller number, which contain observed and latent variables. Our empirical investigation provide evidences for changes in wages structure between 1995 and 1999. In 1995, the job quality factor is the most critical in the determination of wages, while in 1999, the industry attributes factor impacts greatly on the wages.

  • PDF

Establishment of Strategy for Management of Technology Using Data Mining Technique (데이터 마이닝을 통한 기술경영 전략 수립에 관한 연구)

  • Lee, Junseok;Lee, Joonhyuck;Kim, Gabjo;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.126-132
    • /
    • 2015
  • Technology forecasting is about understanding a status of a specific technology in the future, based on the current data of the technology. It is useful when planning technology management strategies. These days, it is common for countries, companies, and researchers to establish R&D directions and strategies by utilizing experts' opinions. However, this qualitative method of technology forecasting is costly and time consuming since it requires to collect a variety of opinions and analysis from many experts. In order to deal with these limitations, quantitative method of technology forecasting is being studied to secure objective forecast result and help R&D decision making process. This paper suggests a methodology of technology forecasting based on quantitative analysis. The methodology consists of data collection, principal component analysis, and technology forecasting by logistic regression, which is one of the data mining techniques. In this research, patent documents related to autonomous vehicle are collected. Then, the texts from patent documents are extracted by text mining technique to construct an appropriate form for analysis. After principal component analysis, logistic regression is performed by using principal component score. On the basis of this result, it is possible to analyze R&D development situation and technology forecasting.

Degradation-Based Remaining Useful Life Analysis for Predictive Maintenance in a Steel Galvanizing Kettle (철강 도금로의 예지보전을 위한 열화 기반 잔존수명 분석)

  • Shin, Joon Ho;Kim, Chang Ouk
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.12
    • /
    • pp.271-280
    • /
    • 2019
  • Smart factory, a critical part of digital transformation, enables data-driven decision making using monitoring, analysis and prediction. Predictive maintenance is a key element of smart factory and the need is increasing. The purpose of this study is to analyze the degradation characteristics of a galvanizing kettle for the steel plating process and to predict the remaining useful life(RUL) for predictive maintenance. Correlation analysis, multiple regression, principal component regression were used for analyzing factors of the process. To identify the trend of degradation, a proposed rolling window was used. It was observed the degradation trend was dependent on environmental temperature as well as production factors. It is expected that the proposed method in this study will be an example to identify the trend of degradation of the facility and enable more consistent predictive maintenance.

Study on the Local Factors Affecting Availability of Car-Sharing in Seoul (서울시의 카셰어링 이용도에 대한 지역적 요인특성분석)

  • Choi, Hyunsu;Park, Juntae
    • Journal of the Korean Society for Railway
    • /
    • v.17 no.5
    • /
    • pp.381-389
    • /
    • 2014
  • This research focuses on the current trend of 'Sharing Transportation' to clarify the regional factors having a decisive effect on the use of Car Sharing. To accomplish this, the current research is built a Database of the regional characteristics of Car Sharing spots based on railway stations in Seoul and performed an analysis of the primary regional factors affecting Car Sharing usage. As a result, we found conclusive factors affecting the use of Car Sharing. This research can be utilized for establishing strategies and effective measures to support the use of Car Sharing and sustainable development with respect to issues of motorization.

인공 신경망 기법을 이용한 제지공정의 지절 원인 분석

  • 이진희;이학래
    • Proceedings of the Korea Technical Association of the Pulp and Paper Industry Conference
    • /
    • 2001.04a
    • /
    • pp.168-168
    • /
    • 2001
  • 제지공정의 지절 현상은 많은 공정 변수들이 복합적으로 작용하여 발생하는 가장 큰 공정 트러블 중의 하나이다. 지절은 생산량 감소 뿐만 아니라 발생 후 공정의 복구 와 정리, 생산재가동 및 공정의 재안정화를 위해 많은 시간과 비용, 그리고 노력이 투 입되어야 하므로 공정의 효율과 생산성을 크게 저하시키는 요인이다. 그러나 지절 현상 의 복잡성 때문에 이에 대해 쉽게 접근하거나 해결하지 못하고 있는 것이 현실이지만 그 필요성은 더욱 더 증대되고 있다. 본 연구에서는 최근 들어 각종 산업분야에서 복잡 한 공정상의 결점 발견 및 진단에 효과적이라고 인정받고 있는 예측 분석기법인 인공 신경망(artificial neural network) 시율레이션과 일반적인 통계기법 중의 하나인 주성분 분석을 이용하여 제지 공정의 지절 현상의 검토 가능성을 타진하였다. 인공신경망이란 인간두뇌에서 일어나는 자극-반응-학습과정을 모사하여 현실세계에 존재하는 다양한 현상들의 업력벡터와 출력상태 간의 비선형 mapping올 컴퓨터 시율 레이션을 통하여 분석하고자 하는 기법으로, 여러 가지 현상들을 학습을 통해서 인식하 는 신경망 내의 신경단위들이 병렬처리에 의해 많은 양의 자료에 대한 추론이나 판단 을 신속하고 정확하게 해주는 특징이 있으며 실시간 패턴인식이나 분류 응용분야에도 매우 매력적으로 이용되고 있는 방법이다. 이러한 인공 신경망 기법 중에서도 본 연구 에서는 퍼셉트론의 한계점을 극복하기 위하여 입력총과 출력층에 한 개 이상의 은닉층 ( (hidden layer)을 사용하여 다층 네트워으로 구성하고, 모든 입력패턴에 대하여 발생하 는 오차함수를 최소화하는 방향으로 연결강도를 조정하는 back propagation 학습 알고 리즘을 사용하였다. 지절의 원인으로 추정 가능한 공정인자들을 변수로 하여 최적의 인 공신경망을 구축하기 위해 학습률과 모멘트 상수의 변화 및 은닉층의 수와 출력층의 뉴런 수를 조절하는 동의 작업을 거쳐 네트워크의 정확도가 높은 인공신경망을 설계하 였다. 또한 이러한 인공신경망과의 비교분석을 위해 동일한 공정 데이터들올 이용하여 보편적으로 사용하는 통계기법 중의 하나인 주성분회귀분석을 실시하였다. 주성분 분석은 여러 개의 반응변수에 대하여 얻어진 다변량 자료의 다차원적인 변 수들을 축소, 요약하는 차원의 단순화와 더불어 서로 상관되어있는 반응변수들 상호간 의 복잡한 구조를 분석하는 기법이다. 본 발표에서는 공정 자료를 활용하여 인공신경망 과 주성분분석을 통해 공정 트러블의 발생에 영향 하는 인자들을 보다 현실적으로 추 정하고, 그 대책을 모색함으로써 이를 최소화할 수 있는 방안을 소개하고자 한다.

  • PDF

An Empirical Study on the Activation Approach for the Competitive Power of Korean Shipping Company in the Korea-China Liner Routes (국적선사의 경쟁력 강화를 위한 한중정기항로 활성화 방안에 대한 실증연구)

  • Lee, Yong-Ho
    • Journal of Navigation and Port Research
    • /
    • v.27 no.2
    • /
    • pp.163-170
    • /
    • 2003
  • This empirical study takes the activation approach for the competitive power of Korean shipping companies in the Korea-China liner routes. Data for this study were collected from Korea/ China/ 3rd flag shipping companies through the 500 questionnaires. The data of 250 respondents were analyzed statistically to verify the hypotheses and to induce Regression Equation which could predicts the influencing level of the determinants to competitive advantage for Korean shipping companies on Korea-China Liner Shipping Routes. Factor Analysis/ Cronbach's Alpha/ Principal Analysis/ Multiple Regression Analysis were used in order to test the hypotheses for the empirical study.

Analysis and Classification of Acoustic Emission Signals During Wood Drying Using the Principal Component Analysis (주성분 분석을 이용한 목재 건조 중 발생하는 음향방출 신호의 해석 및 분류)

  • Kang, Ho-Yang;Kim, Ki-Bok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.23 no.3
    • /
    • pp.254-262
    • /
    • 2003
  • In this study, acoustic emission (AE) signals due to surface cracking and moisture movement in the flat-sawn boards of oak (Quercus Variablilis) during drying under the ambient conditions were analyzed and classified using the principal component analysis. The AE signals corresponding to surface cracking showed higher in peak amplitude and peak frequency, and shorter in rise time than those corresponding to moisture movement. To reduce the multicollinearity among AE features and to extract the significant AE parameters, correlation analysis was performed. Over 99% of the variance of AE parameters could be accounted for by the first to the fourth principal components. The classification feasibility and success rate were investigated in terms of two statistical classifiers having six independent variables (AE parameters) and six principal components. As a result, the statistical classifier having AE parameters showed the success rate of 70.0%. The statistical classifier having principal components showed the success rate of 87.5% which was considerably than that of the statistical classifier having AE parameters.