• Title/Summary/Keyword: 2변수 선형회귀분석

Search Result 218, Processing Time 0.023 seconds

Prediction Techniques for Difficulty Level of Hanja Using Multiple Linear Regression (다중 회귀 분석을 이용한 한자 난이도 예측 기법 연구)

  • Choi, Jeongwhan;Noh, Jiwoo;Kim, Suntae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.6
    • /
    • pp.219-225
    • /
    • 2019
  • There is a problem with the existing method of selecting the difficulty levels of Hanja characters. Some Hanja characters selected by the existing methods are different from Sino-Korean words used in real life and it is impossible to know how many times the Hanja characters are used. To solve this problem, we measure the difficulty of Hanja characters using the multiple regression analysis with the frequency as the features. Based on the elementary textbooks, FWS and FHU are counted. A questionnaire is written using the two frequencies and stroke together to answer the appropriate timing of learning the Hanja characters and use them as target variables for regression. Use stepwise regression to select the appropriate features and perform multiple linear regression. The R2 score of the model was 0.1105 and the RMSE was 0.1105.

Mixed dentition analysis using a multivariate approach (다변량 기법을 이용한 혼합치열기 분석법)

  • Seo, Seung-Hyun;An, Hong-Seok;Lee, Shin-Jae;Lim, Won Hee;Kim, Bong-Rae
    • The korean journal of orthodontics
    • /
    • v.39 no.2
    • /
    • pp.112-119
    • /
    • 2009
  • Objective: To develop a mixed dentition analysis method in consideration of the normal variation of tooth sizes. Methods: According to the tooth-size of the maxillary central incisor, maxillary 1st molar, mandibular central incisor, mandibular lateral incisor, and mandibular 1st molar, 307 normal occlusion subjects were clustered into the smaller and larger tooth-size groups. Multiple regression analyses were then performed to predict the sizes of the canine and premolars for the 2 groups and both genders separately. For a cross validation dataset, 504 malocclusion patients were assigned into the 2 groups. Then multiple regression equations were applied. Results: Our results show that the maximum errors of the predicted space for the canine, 1st and 2nd premolars were 0.71 and 0.82 mm residual standard deviation for the normal occlusion and malocclusion groups, respectively. For malocclusion patients, the prediction errors did not imply a statistically significant difference depending on the types of malocclusion nor the types of tooth-size groups. The frequency of prediction error more than 1 mm and 2 mm were 17.3% and 1.8%, respectively. The overall prediction accuracy was dramatically improved in this study compared to that of previous studies. Conclusions: The computer aided calculation method used in this study appeared to be more efficient.

Prediction of Seasonal Nitrate Concentration in Springs on the Southern Slope of Jeju Island using Multiple Linear Regression of Geographic Spatial Data (지리 공간 자료의 다중회귀분석을 이용한 제주도 남측사면 용천수의 시기별 질산성 질소 농도 예측)

  • Jung, Youn-Young;Koh, Dong-Chan;Kang, Bong-Rae;Ko, Kyung-Suk;Yu, Yong-Jae
    • Economic and Environmental Geology
    • /
    • v.44 no.2
    • /
    • pp.135-152
    • /
    • 2011
  • Nitrate concentrations in springs at the southern slope of Jeju Island were predicted using multiple linear regression (MLR) of spatial variables including hydrogeological parameters and land use characteristics. Springs showed wide range of nitrate concentrations from <0.02 to 86 mg/L with a mean of 20 mg/L. Spatial variables were generated for the circular buffer when the optimal buffer radius was assigned as 400 m. Selected regression models were tested using the p values and Durbin-Watson statistics. Explanatory variables were selected using the adjusted $R^2$, Cp (total squared error) and AIC (Akaike's Information Criterion), and significance. In addition, mutual linear relations between variables were also considered. Small portion of springs, usually <10% of total samples, were identified as outliers indicating limitations of MLR using circular buffers. Adjusted $R^2$ of the proposed models was improved from 0.75 to 0.87 when outliers were eliminated. In particular, the areal proportion of natural area had the greatest influence on the nitrate concentrations in springs. Among anthropogenic land uses, the influence of nitrate contamination is diminishing in the following order of orchard, residential area, and dry farmland. It is apparent quality of springs in the study area is likely to be controlled by land uses instead of hydrogeological parameters. Most of all, it is worth highlighting that the contamination susceptibility of springs is highly sensitive to nearby land uses, in particular, orchard.

Long-term Variations of Water Quality Parameters in Lake Kyoungpo (경포호에서 수질변수들의 장기적인 변화)

  • Kwak, Sungjin;Bhattrai, Bal Dev;Choi, Kwansoon;Heo, Woomyung
    • Korean Journal of Ecology and Environment
    • /
    • v.48 no.2
    • /
    • pp.95-107
    • /
    • 2015
  • In order to identify long-term trends of water quality parameters in Lake Kyeongpo, Mann-Kendall test, Sen's slope estimator and linear regression were applied on data, with 15 parameters from three different sites and rainfall, monitored once in every two months from March to November during 1998~2013. Seasonal variation analysis only used Mann-Kendall test and Sen's slope estimator. Analysis result showed that salinity, transparency and nutrient variables (total phosphorus, dissolved inorganic phosphorus, total nitrogen, nitrate nitrogen, ammonia nitrogen) were only parameters having statistically significant trend. In linear regression analysis, salinity (surface and bottom layer of all sites) and transparency (only at site 1), were figured out with statistically significant increasing trend, while in non-parametric statistical method, salinity and transparency in all sites (surface, middle, deep) were figured out with statistically significant increasing trend. Water quality parameters showing statistically significant decreasing trends were dissolved oxygen (surface layer of site 1 and bottom layer of sites 2 and 3), total phosphorus (sites 1 and 2), dissolved inorganic phosphorus, total nitrogen, nitrate nitrogen and ammonia nitrogen in the linear regression analysis and, dissolved oxygen (bottom layer of all sites), total phosphorus, dissolved inorganic phosphorus, total nitrogen, nitrate nitrogen and ammonia nitrogen in the non-parametric method. Seasonal trend analysis result showed that salinity, turbidity, transparency and suspended solids in spring, salinity, transparency, nitrate nitrogen and suspended solids in summer and temperature, salinity, transparency and suspended solids in fall were the variables depending on the season with increasing trends. In general, rainfall during the research period showed decreasing trend. The significant reduction trends of nutrients in Lake Kyeongpo were believed to be related to lagoon restoration and water management project run by Gangneung city and under-water wear removal, but further detailed studies are needed to know the exact causes.

A Study for Improving the Performance of Data Mining Using Ensemble Techniques (앙상블기법을 이용한 다양한 데이터마이닝 성능향상 연구)

  • Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.561-574
    • /
    • 2010
  • We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.

Analysis of Factors Influencing Patent Citations: Focused on Korea Medical Device Patents (특허 인용에 영향을 미치는 요인 분석: 국내의료기기 특허를 중심으로)

  • Yoon, Jae Woong;Lee, Chang Seop;Lee, Suk Jun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.103-133
    • /
    • 2016
  • The valuation of patented technology has been recently emphasized, and the patent citation is known as an important factor. This study performed a generalized linear model to find variables that effect the patent citation. We classified 13 variables as morphological, technological and conceptual factors and used them to find out effective variables in 14 medical devices classification. Through the empirical study, we found seven effective variables (assignee nationality, assignee character, the number of inventors, the number of application countries, the number of IPC, the number of references, the strength of bibliographic coupling). In order to apply to Korean industry, this study has significance that provides basic research to citation analysis model.

Analysis of Urban Heat Island Effect Using Information from 3-Dimensional City Model (3DCM) (3차원 도시공간정보를 이용한 도시열섬현상의 분석)

  • Chun, Bun-Seok;Kim, Hag-Yeol
    • Spatial Information Research
    • /
    • v.18 no.4
    • /
    • pp.1-11
    • /
    • 2010
  • Unlike the previous studies which have focused on 2-dimensional urban characteristics, this paper presents statistical models explaining urban heat island(UHI) effect by 3-dimensional urban morphologic information and addresses its policy implications. 3~dimensional informations of Columbus, Ohio arc captured from LiDAR data and building boundary informations are extracted from a building digital map, Finally NDV[ and temperature data are calculated by manipulating band 3, band 4, and thermal hand of LandSat images. Through complicated data processing, 6 independent variables(building surface area, building volume, height to width ratio, porosity, plan surface area) are introduced in simple and multiple linear regression models. The regression models are specified by Box-Tidwell method, finding the power to which the independent variable needs to raised to be in a linearity. Porosity, NDVI, and building surface area are carefully chosen as explanatory variables in the final multiple regression model, which explaining about 57% of the variability in temperatures. On reducing UHI, various implications of the results give guidelines to policy-making in open space, roof garden, and vertical garden management.

Computing Algorithm for Genetic Evaluations on Several Linear and Categorical Traits in A Multivariate Threshold Animal Model (범주형 자료를 포함한 다형질 임계개체모형에서 유전능력 추정 알고리즘)

  • Lee, D.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.2
    • /
    • pp.137-144
    • /
    • 2004
  • Algorithms for estimating breeding values on several categorical data by using latent variables with threshold conception were developed and showed. Thresholds on each categorical trait were estimated by Newton’s method via gradients and Hessian matrix. This algorithm was developed by way of expansion of bivariate analysis provided by Quaas(2001). Breeding values on latent variables of categorical traits and observations on linear traits were estimated by preconditioned conjugate gradient(PCG) method, which was known having a property of fast convergence. Example was shown by simulated data with two linear traits and a categorical trait with four categories(CE=calving ease) and a dichotomous trait(SB=Still Birth) in threshold animal mixed model(TAMM). Breeding value estimates in TAMM were compared to those in linear animal mixed model (LAMM). As results, correlation estimates of breeding values to parameters were 0.91${\sim}$0.92 on CE and 0.87${\sim}$0.89 on SB in TAMM and 0.72~0.84 on CE and 0.59~0.70 on SB in LAMM. As conclusion, PCG method for estimating breeding values on several categorical traits with linear traits were feasible in TAMM.

Development of Korean Peninsula VS30 Map Based on Proxy Using Linear Regression Analysis (일반선형회귀분석을 이용한 프락시 기반 한반도 VS30지도 개발)

  • Choi, Inhyeok;Yoo, Byeongho;Kwak, Dongyoup
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.1
    • /
    • pp.35-44
    • /
    • 2022
  • The VS30 map is used as a key variable for site amplification in the ShakeMap, which predicts ground motion at any site. However, no VS30 map considering Korean geology and geomorphology has been developed yet. To develop a proxy-based VS30 map, we used 1,101 VS profiles obtained from a geophysical survey and collected proxy layers of geological and topographical information for the Korean Peninsula. Then, VS30 prediction models were developed using linear regression analysis for each geological age considering the distribution of VS30. As a result, models depending on geomorphology were suggested per each geologic group, including Quaternary, Fill, Ocean, Mesozoic group and Precambrian. Resolution of map is doubled from that of VS30 map by U.S. Geological Survey (USGS). Standard deviation of residual in natural log of proxy-based VS30 map is 0.233, whereas standard deviation of slope-based USGS VS30 map is 0.387. Therefore, the proxy-based VS30 map developed in this study is expected to have less uncertainty and to contribute to predicting more accurately the ground motion amplitude.

차별정보가설(差別精報假說) 하(下)에서 기업(企業)의 다각화(多角化)와 보통주(普通株) 수익률(收益率)

  • Choi, Yong-Sik
    • The Korean Journal of Financial Management
    • /
    • v.11 no.2
    • /
    • pp.65-81
    • /
    • 1994
  • 주식의 기대수익률과 체계적 위험과의 관계를 설명한 자본자산가격결정모형(CAPM)은 지난 30년간 많은 재무학자들에 의해 지속적으로 검증 받아 왔다. 물론, 자본시장의 효율성도 포함된 결합가설(結合假說)(joint hypothesis)의 검증이라는 어려운 점도 있으나, 일련의 연구는 기존에 발견된 주가이례(株價異例)현상을 설명하기 위해 새로운 위험 변수가 필요하다고 지적하였다. 이러한 방향으로의 연구 중 차별정보가설은 투자분석에 이용 가능한 정보의 양(量)이 위험측정의 불확실성을 결정하므로 주식의 수익률도 따라서 변하게 된다고 설명하고 있다. 본 연구는 기업의 다각화가 진행됨에 따라 각 사업단위의 회계정보 및 소속산업의 자료수집을 통한 정보의 양이 증가된다는 가정아래 차별정보가설을 실증 분석한다. 기업규모를 통제하여 구성한 포트폴리오 분석 방법은 다각화지 수가 낮은 기업이 체계적으로 높은 초과수익률을 갖는 것으로 나타났다. 이 분석결과는 차별정보가설이 예 상하는 바와 일치하는 결과로 해석될 수 있다. 그러나, 기업규모의 통제없이 구성한 다각화 포트폴리오의 분석결과와 개별기업 차원의 회귀분석 결과는 초과수익률과 기업의 다각화 정도가 선형 관계가 아닌 U자형의 관계에 있다는 것을 보여주고 있다.

  • PDF