• Title/Summary/Keyword: 라쏘

Search Result 12, Processing Time 0.016 seconds

Graphical method for evaluating the impact of influential observations in high-dimensional data (고차원 자료에서 영향점의 영향을 평가하기 위한 그래픽 방법)

  • Ahn, Sojin;Lee, Jae Eun;Jang, Dae-Heung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1291-1300
    • /
    • 2017
  • In the high-dimensional data, the number of variables is very larger than the number of observations. In this case, the impact of influential observations on regression coefficient estimates can be very large. Jang and Anderson-Cook (2017) suggested the LASSO influence plot. In this paper, we propose the LASSO influence plot, LASSO variable selection ranking plot, and three-dimensional LASSO influence plot as graphical methods for evaluating the impact of influential observations in high-dimensional data. With real two high-dimensional data examples, we apply these graphical methods as the regression diagnostics tools for finding influential observations. It has been found that we can obtain influential observations with by these graphical methods.

Machine Learning Prediction of Economic Effects of Busan's Strategic Industry through Ridge Regression and Lasso Regression (릿지 회귀와 라쏘 회귀 모형에 의한 부산 전략산업의 지역경제 효과에 대한 머신러닝 예측)

  • Yi, Chae-Deug
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.197-215
    • /
    • 2021
  • This paper analyzes the machine learning predictions of the economic effects of Busan's strategic industries on the employment and income using the Ridge Regression and Lasso Regression models with regulation terms. According to the Ridge estimation and Lasso estimation models of employment, the intelligence information service industry such as the service platform, contents, and smart finance industries and the global tourism industry such as MICE and specialized tourism are predicted to influence on the employment in order. However, the Ridge and Lasso regression model show that the future transportation machine industry does not significantly increase the employment and income since it is the primitive investment industry. The Ridge estimation models of the income show that the intelligence information service industry and global tourism industry are also predicted to influence on the income in order. According to the Lasso estimation models of income, four strategic industries such as the life care, smart maritime, the intelligence machine, and clean tech industry do not influence the income. Furthermore, the future transportation machine industry may influence the income negatively since it is the primitive investment industry. Thus, we have to select the appropriate economic objectives and priorities of industrial policies.

Detection of multiple change points using penalized least square methods: a comparative study between ℓ0 and ℓ1 penalty (벌점-최소제곱법을 이용한 다중 변화점 탐색)

  • Son, Won;Lim, Johan;Yu, Donghyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1147-1154
    • /
    • 2016
  • In this paper, we numerically compare two penalized least square methods, the ${\ell}_0$-penalized method and the fused lasso regression (FLR, ${\ell}_1$ penalization), in finding multiple change points of a signal. We find that the ${\ell}_0$-penalized method performs better than the FLR, which produces many false detections in some cases as the theory tells. In addition, the computation of ${\ell}_0$-penalized method relies on dynamic programming and is as efficient as the FLR.

Labor-saving practices in Tartary buckwheat(Fagopyrum tataricum) production (타타리메밀의 생력재배 기술)

  • Lim, Yong-Sup;Park, Byoung-Jae;Park, Cheol-Ho;Park, Jong-In;Kim, Yang-Sik;Park, Kwang-Ho;Kang, Yun-Kyu;Chang, Kwang-Jin
    • Korean Journal of Plant Resources
    • /
    • v.22 no.4
    • /
    • pp.359-363
    • /
    • 2009
  • In order to establish labor-saving culture technology in Tartary buckwheat, three cultural practices: hand planting, drill sowing machine and soil cover direct seeding machine, were compared.The highest grain yield was found in soil cover direct seeding plot with a value of 3.4 g per plant. As a result, grain yield may be estimated to be 113kg in soil cover direct seeding and 80kg in hand scattering. In addition, for the weed control, three herbicide treatments: single use of Alachlor, mixture and combination of Alachlor and Paraquat dichloride were conducted. the mixture showed over 90% weed control value, and the highest grain yield was found in the combination treatment. Combine machine was effective to reduce the ratio of grain loss and working hour by enhancing the working efficiency to 15${\sim}$20min/10a.

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

  • Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.557-563
    • /
    • 2017
  • When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.

Case study: Selection of the weather variables influencing the number of pneumonia patients in Daegu Fatima Hospital (사례연구: 대구 파티마 병원 폐렴 입원 환자 수에 영향을 미치는 날씨 변수 선택)

  • Choi, Sohyun;Lee, Hag Lae;Park, Chungun;Lee, Kyeong Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.1
    • /
    • pp.131-142
    • /
    • 2017
  • The number of hospital admissions for pneumonia tends to increase annually and even more, pneumonia, the fifth leading causes of death among elder adults, is one of top diseases in terms of hospitalization rate. Although mainly bacteria and viruses cause pneumonia, the weather is also related to the occurrence of pneumonia. The candidate weather variables are humidity, amount of sunshine, diurnal temperature range, daily mean temperatures and density of particles. Due to the delayed occurrence of pneumonia, lagged weather variables are also considered. Additionally, year effects, holiday effects and seasonal effects are considered. We select the related variables that influence the occurrence of pneumonia using penalized generalized linear models.

Permutation test for a post selection inference of the FLSA (순열검정을 이용한 FLSA의 사후추론)

  • Choi, Jieun;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.863-874
    • /
    • 2021
  • In this paper, we propose a post-selection inference procedure for the fused lasso signal approximator (FLSA). The FLSA finds underlying sparse piecewise constant mean structure by applying total variation (TV) semi-norm as a penalty term. However, it is widely known that this convex relaxation can cause asymptotic inconsistency in change points detection. As a result, there can remain false change points even though we try to find the best subset of change points via a tuning procedure. To remove these false change points, we propose a post-selection inference for the FLSA. The proposed procedure applies a permutation test based on CUSUM statistic. Our post-selection inference procedure is an extension of the permutation test of Antoch and Hušková (2001) which deals with single change point problems, to multiple change points detection problems in combination with the FLSA. Numerical study results show that the proposed procedure is better than naïve z-tests and tests based on the limiting distribution of CUSUM statistics.

An empirical evidence of inconsistency of the ℓ1 trend filtering in change point detection (1 추세필터의 변화점 식별에 있어서의 비일치성)

  • Yu, Donghyeon;Lim, Johan;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.371-384
    • /
    • 2022
  • The fused LASSO signal approximator (FLSA) can be applied to find change points from the data having piecewise constant mean structure. It is well-known that the FLSA is inconsistent in change points detection. This inconsistency is due to a total-variation denoising penalty of the FLSA. ℓ1 trend filter, one of the popular tools for finding an underlying trend from data, can be used to identify change points of piecewise linear trends. Since the ℓ1 trend filter applies the sum of absolute values of slope differences, it can be inconsistent for change points recovery as the FLSA. However, there are few studies on the inconsistency of the ℓ1 trend filtering. In this paper, we demonstrate the inconsistency of the ℓ1 trend filtering with a numerical study.

Forecasting Korea's GDP growth rate based on the dynamic factor model (동적요인모형에 기반한 한국의 GDP 성장률 예측)

  • Kyoungseo Lee;Yaeji Lim
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.255-263
    • /
    • 2024
  • GDP represents the total market value of goods and services produced by all economic entities, including households, businesses, and governments in a country, during a specific time period. It is a representative economic indicator that helps identify the size of a country's economy and influences government policies, so various studies are being conducted on it. This paper presents a GDP growth rate forecasting model based on a dynamic factor model using key macroeconomic indicators of G20 countries. The extracted factors are combined with various regression analysis methodologies to compare results. Additionally, traditional time series forecasting methods such as the ARIMA model and forecasting using common components are also evaluated. Considering the significant volatility of indicators following the COVID-19 pandemic, the forecast period is divided into pre-COVID and post-COVID periods. The findings reveal that the dynamic factor model, incorporating ridge regression and lasso regression, demonstrates the best performance both before and after COVID.

Effect of Weed Control and Lodging Reduction for Increase the Grain Yield of Buckwheat (메밀증수를 위한 잡초방제 및 도복경감 효과)

  • Heo, Kwon;Lee, Han-Bum;Park, Chul-Ho;Choi, Yong-Soon
    • Korean Journal of Plant Resources
    • /
    • v.13 no.3
    • /
    • pp.243-248
    • /
    • 2000
  • This study was conducted to evaluate the effects of weed control and lodging reduction in the cultivation of buckwheat. The effect of weed control was significant. In the herbicide plot, nevertheless, grain yield and plant height were more decreased than habitual and vinyl mulching plots. Therefore, the application of herbicide was considered unnecessary in buckwheat cultivation having short growth period. In the habitual plot, dominant weed species are Digitaria sanguinalis, Erigeron annuus, E. canadensis, Setaria viridis, and Stellaria alsine var. undulata. On the effect of plant dwarf agent, C.C.C. and TIBA, plant height became shorter than habitual plot but the grain yield decreased. The latter tip pinching time, the less in grain yield which indicate tip pinching is ineffective in grain yield and lodging reduction. Among the wild species and cultivars of Fagopyrum, F. urophyllum was differentiated into xylem and phloem tissues indicating woody plant. Stem hardness of this species was the hardest as 625,110,000 dyne/cm$^2$which is at least 3.5 times harder than F. esculentum cv. Suwon #12. Therefore, it needs that the woody habitat gene of F. urophyllum is transfered into other cultivars in buckwheat breeding strategy.

  • PDF