• Title/Summary/Keyword: 엘라스틱넷

Search Result 3, Processing Time 0.019 seconds

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

An Improved RSR Method to Obtain the Sparse Projection Matrix (희소 투영행렬 획득을 위한 RSR 개선 방법론)

  • Ahn, Jung-Ho
    • Journal of Digital Contents Society
    • /
    • v.16 no.4
    • /
    • pp.605-613
    • /
    • 2015
  • This paper addresses the problem to make sparse the projection matrix in pattern recognition method. Recently, the size of computer program is often restricted in embedded systems. It is very often that developed programs include some constant data. For example, many pattern recognition programs use the projection matrix for dimension reduction. To improve the recognition performance, very high dimensional feature vectors are often extracted. In this case, the projection matrix can be very big. Recently, RSR(roated sparse regression) method[1] was proposed. This method has been proved one of the best algorithm that obtains the sparse matrix. We propose three methods to improve the RSR; outlier removal, sampling and elastic net RSR(E-RSR) in which the penalty term in RSR optimization function is replaced by that of the elastic net regression. The experimental results show that the proposed methods are very effective and improve the sparsity rate dramatically without sacrificing the recognition rate compared to the original RSR method.

Case study: Selection of the weather variables influencing the number of pneumonia patients in Daegu Fatima Hospital (사례연구: 대구 파티마 병원 폐렴 입원 환자 수에 영향을 미치는 날씨 변수 선택)

  • Choi, Sohyun;Lee, Hag Lae;Park, Chungun;Lee, Kyeong Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.1
    • /
    • pp.131-142
    • /
    • 2017
  • The number of hospital admissions for pneumonia tends to increase annually and even more, pneumonia, the fifth leading causes of death among elder adults, is one of top diseases in terms of hospitalization rate. Although mainly bacteria and viruses cause pneumonia, the weather is also related to the occurrence of pneumonia. The candidate weather variables are humidity, amount of sunshine, diurnal temperature range, daily mean temperatures and density of particles. Due to the delayed occurrence of pneumonia, lagged weather variables are also considered. Additionally, year effects, holiday effects and seasonal effects are considered. We select the related variables that influence the occurrence of pneumonia using penalized generalized linear models.