• Title/Summary/Keyword: 다중선형회귀모델

Search Result 110, Processing Time 0.034 seconds

Estimating soil moisture using machine learning approach: A Case Study to Yongdam watershed (기계학습 기반의 토양함수 예측 기법 개발 (용담댐 시험유역을 중심으로))

  • Huy, Nguyen Dinh;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2018.05a
    • /
    • pp.167-167
    • /
    • 2018
  • 토양수분은 토양에 포함된 평균 수분량을 나타내며 수문 순환 관점에서 매우 중요한 수문변량 중 하나이다. 본 연구에서는 대표적인 기계학습 방법인 Support Vector Machine (SVM)을 이용한 토양 함수 예측 기법을 개발하고자 하며, 예측인자로서 원격 탐측 기반의 토양함수자료, 강수량, 온도 등을 활용하고자 한다. SVM은 Kernel 함수를 이용하여 복잡한 비선형 관계를 선형 가정을 통해서 해석하는 기계학습 방법으로서 전역모델(global model)로서 다양한 수문기상분야에 적용이 이루어지고 있다. SVM의 장점은 일정 부분의 오차를 허용함으로서 모형의 일반화 측면에서 기존 인공신경망(artificial neural network, ANN)에 비해 우수한 성능을 나타내며, 특히 예측모형으로서 적용성이 매우 크다. 본 연구에서는 과거 토양 함수 자료와 강수, 온도, 위성 관측 기반 정보 등을 이용하여 모형을 적합시키고 이를 미계측 유역으로 확장하는데 연구의 목적이 있으며, 본 연구를 통해 제안된 모형은 용담댐 시험유역을 대상으로 적용되며 기존 ANN 모형 및 다중회귀분석 결과와 비교를 통해 모형의 적합성을 평가하고자한다.

  • PDF

Fertility Evaluation of Upland Fields by Combination of Landscape and Soil Survey Data with Chemical Properties in Soil (토양 화학성과 지형 및 토양 조사자료를 활용한 밭 토양의 비옥도 평가)

  • Hong, Soon-Dal;Kim, Jai-Joung;Min, Kyong-Beum;Kang, Bo-Goo;Kim, Hyun-Ju
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.33 no.4
    • /
    • pp.221-233
    • /
    • 2000
  • Evaluation method of soil fertility by application of geographic information system (GIS) which includes landscape characteristics and soil map data was investigated from productivities of red pepper and tobacco grown on the fields with no fertilization. Total 131 fields experiments, 64 fields of red pepper and 67 fields of tobacco were conducted from 22 and 23 fields for red pepper and tobacco, respectively, located at Cheangweon and Eumseong counties in 1996, from 20 and 25 fields at Boeun and Goesan counties in 1997, and 22 and 19 fields at Jincheon and Chungju counties in 1998. All the experimental sites were selected on the basis of wide range of distribution in landscape and soil attributes. Dry weights and nutrients (N, P and K) uptakes by red pepper plant and tobacco leaves were considered as basic fertility of the soil (BFS). The BFS was estimated by twenty-five independent variables including 13 chemical properties and 12 GIS data. Twenty-five independent variables were classified by two groups, 15 quantitative variables and 10 qualitative variables, and were analyzed by multiple linear regression (MLR) of REG and GLM models of SAS. Dry weight of red pepper (DWRP) and dry weight of tobacco leaves (DWTL) every year showed high variations by five times in difference plots with minimum yield and maximum yield indicating the diverse soil fertility among the experimental fields. Evaluation for the BFS by the MLR including independent variables was better than that by simple regression showing gradual improvement by adding chemical properties, quantitative variables, and qualitative variables of the GIS. However the evaluation for the BFS by the MLR showed the better result for tobacco than red pepper. For example the variability in the DWTL by MLR was explained 34.2% by only chemical properties, 35.0% by adding quantitative variables, and 72.5% by adding both the quantitative and qualitative variables of the GIS compared with 21.7% by simple regression with $NO_3-N$ content in soil. Consequently, it is assumed that this approach by the MLR including both the quantitative and qualitative variables was available as an evaluation model of soil fertility for upland field.

  • PDF

Development of Prediction Model of Subcontract's Bidding-Ratio for Private Apartment Projects (민간 공동주택 하도급 낙찰률 예측모델 개발)

  • Jang, Ki-Suk;Koo, Kyo-Jin
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2021.11a
    • /
    • pp.250-251
    • /
    • 2021
  • A subcontract work order is the basis of the construction process and consists of the root and trunk of the construction industry. The construction process through a subcontract work order is an important element of project success, and it is the basic unit of creating profit in the construction industry. Therefore, correct analysis and forecasting of subcontract work orders allow correct estimation of construction cost and profit which is the foundation of corporate decision making. This study has started to provide predictions of subcontractor's bidding-ratio for decision-making. Since the actual project data has been used in this study, the contribution level of the model is highly expected in actual field. The statistical confidential level of adjusted decision coefficient is concluded low because of limited sample numbers. However, its accuracy and confidence level can be increased through increasing sample numbers, considering more variables, and studying of reducing error.

  • PDF

Curve Estimation among Citation and Centrality Measures in Article-level Citation Networks (문헌 단위 인용 네트워크 내 인용과 중심성 지수 간 관계 추정에 관한 연구)

  • Yu, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.193-204
    • /
    • 2012
  • The characteristics of citation and centrality measures in citation networks can be identified using multiple linear regression analyses. In this study, we examine the relationships between bibliometric indices and centrality measures in an article-level co-citation network to determine whether the linear model is the best fitting model and to suggest the necessity of data transformation in the analysis. 703 highly cited articles in Physics published in 2004 were sampled, and four indicators were developed as variables in this study: citation counts, degree centrality, closeness centrality, and betweenness centrality in the co-citation network. As a result, the relationship pattern between citation counts and degree centrality in a co-citation network fits a non-linear rather than linear model. Also, the relationship between degree and closeness centrality measures, or that between degree and betweenness centrality measures, can be better explained by non-linear models than by a linear model. It may be controversial, however, to choose non-linear models as the best-fitting for the relationship between closeness and betweenness centrality measures, as this result implies that data transformation may be a necessary step for inferential statistics.

NAVER Data Lab data-based Assessment of National Awareness Vulnerability of Past Floods over the Korean Peninsula (2011-2018) (NAVER DATA LAB 데이터 기반 과거 한반도 홍수에 대한 대중 인지도 취약성 평가 (2011-2018))

  • Eun Mi Lee;Young Uk Yu;Young hun Jeong;Jong Hun Kam
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.59-59
    • /
    • 2023
  • 기후변화로 인한 집중호우와 홍수는 하천의 범람, 내수침수 등을 일으킨다. 최근 발생한 2022년9월 태풍 '힌남노'는 포항시 10명의 인명 피해와 1조 7000억원의 재산 피해로 막대한 피해를 야기시켰다. 본 연구는 2011년부터 2018년까지 시군구 단위의 행정구역별 홍수 기간 강우량, 피해액, 홍수 지역의 인구 자료를 NAVER DATA LAB(2016년부터 자료 제공) '홍수' 검색량 데이터와 비교 분석하였다. 본 연구에서는 다량의 강우량 또는 높은 피해액이 발생한 시기에 홍수 검색량이 낮았던 지역을 홍수에 대한 대중 인지도가 취약한 지역으로 정의하였다. '홍수' 검색량과 강우량, 피해액, 홍수 지역 인구와의 상관관계를 분석한 결과, 강우량과 인구는 각각 0.86, 0.81의 높은 상관계수를 보인 반면, 피해액은 0.52로 상대적으로 낮은 상관관계를 보였다. 2016-2018년 특/광역시단위 분석 결과, 총 17번의 홍수 발생 중 '인천광역시'와 '세종특별시'에서 피해액 규모가 각각 2, 3순위로 높았던 반면 홍수 인지도는 각각 6, 11순위로 홍수 인지도가 취약한 지역으로 평가되었다. 도 단위 평가 시, 총 34번의 홍수 발생 중 '강원도'와 '경상북도'에서 피해액 규모 3순위, 강우량 10순위 일 때, 홍수 인지도는 27순위로 홍수 인지도가 취약한 지역으로 평가되었다. 다중 선형회귀 기법을 통해 2016년부터의 데이터를 기반으로 모델을 훈련하여 2016년 이전의 '홍수' 검색량 예측 자료를 재생산하였다. 2011-2015년 특/광역시 중심의 평가에서, 총 25번의 홍수 발생 중 부산광역시에서 피해액 규모가 1순위, 강우량이 2순위로 높았던 반면 홍수 인지도는 6순위로 홍수인지도가 취약한 지역으로 평가되었다. 도 단위 평가 시, 총 50번의 홍수 발생 중 '충청남도'와 '경기도'에서 피해액 규모가 3순위일 때 홍수 인지도가 7순위로 홍수 인지도가 취약한 지역으로 평가되었다. 본 연구는 물리·사회시스템의 빅데이터를 분석하여, 사회수문학적 접근 방식으로 홍수에 대한 사회적 취약성을 새롭게 제시하며 사회과학과 수자원 분야의 융합연구 필요성을 강조하였다.

  • PDF

Suggestion and Evaluation of a Multi-Regression Linear Model for Creep Life Prediction of Alloy 617 (Alloy 617의 장시간 크리프 수명 예측을 위한 다중회귀 선형 모델의 제안 및 평가)

  • Yin, Song-Nan;Kim, Woo-Gon;Jung, Ik-Hee;Kim, Yong-Wan
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.4
    • /
    • pp.366-372
    • /
    • 2009
  • Creep life prediction has been commonly used by a time-temperature parameter (TTP) which is correlated to an applied stress and temperature, such as Larson-Miller (LM), Orr-Sherby-Dorn (OSD), Manson-Haferd (MH) and Manson-Succop (MS) parameters. A stress-temperature linear model (STLM) based on Arrhenius, Dorn and Monkman-Grant equations was newly proposed through a mathematical procedure. For this model, the logarithm time to rupture was linearly dependent on both an applied stress and temperature. The model parameters were properly determined by using a technique of maximum likelihood estimation of a statistical method, and this model was applied to the creep data of Alloy 617. From the results, it is found that the STLM results showed better agreement than the Eno’s model and the LM parameter ones. Especially, the STLM revealed a good estimation in predicting the long-term creep life of Alloy 617.

Multiple Linear Regression Analysis of PV Power Forecasting for Evaluation and Selection of Suitable PV Sites (태양광 발전소 건설부지 평가 및 선정을 위한 선형회귀분석 기반 태양광 발전량 추정 모델)

  • Heo, Jae;Park, Bumsoo;Kim, Byungil;Han, SangUk
    • Korean Journal of Construction Engineering and Management
    • /
    • v.20 no.6
    • /
    • pp.126-131
    • /
    • 2019
  • The estimation of available solar energy at particular locations is critical to find and assess suitable locations of PV sites. The amount of PV power generation is however affected by various geographical factors (e.g., weather), which may make it difficult to identify the complex relationship between affecting factors and power outputs and to apply findings from one study to another in different locations. This study thus undertakes a regression analysis using data collected from 172 PV plants spatially distributed in Korea to identify critical weather conditions and estimate the potential power generation of PV systems. Such data also include solar radiation, precipitation, fine dust, humidity, temperature, cloud amount, sunshine duration, and wind speed. The estimated PV power generation is then compared to the actual PV power generation to evaluate prediction performance. As a result, the proposed model achieves a MAPE of 11.696(%) and an R-squred of 0.979. It is also found that the variables, excluding humidity, are all statistically significant in predicting the efficiency of PV power generation. According, this study may facilitate the understanding of what weather conditions can be considered and the estimation of PV power generation for evaluating and determining suitable locations of PV facilities.

Learning Data Model Definition and Machine Learning Analysis for Data-Based Li-Ion Battery Performance Prediction (데이터 기반 리튬 이온 배터리 성능 예측을 위한 학습 데이터 모델 정의 및 기계학습 분석 )

  • Byoungwook Kim;Ji Su Park;Hong-Jun Jang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.133-140
    • /
    • 2023
  • The performance of lithium ion batteries depends on the usage environment and the combination ratio of cathode materials. In order to develop a high-performance lithium-ion battery, it is necessary to manufacture the battery and measure its performance while varying the cathode material ratio. However, it takes a lot of time and money to directly develop batteries and measure their performance for all combinations of variables. Therefore, research to predict the performance of a battery using an artificial intelligence model has been actively conducted. However, since measurement experiments were conducted with the same battery in the existing published battery data, the cathode material combination ratio was fixed and was not included as a data attribute. In this paper, we define a training data model required to develop an artificial intelligence model that can predict battery performance according to the combination ratio of cathode materials. We analyzed the factors that can affect the performance of lithium-ion batteries and defined the mass of each cathode material and battery usage environment (cycle, current, temperature, time) as input data and the battery power and capacity as target data. In the battery data in different experimental environments, each battery data maintained a unique pattern, and the battery classification model showed that each battery was classified with an error of about 2%.

Construction of Urban Crime Prediction Model based on Census Using GWR (GWR을 이용한 센서스 기반 도시범죄 특성 분석 및 예측모델 구축)

  • YOO, Young-Woo;BAEK, Tae-Kyung
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.4
    • /
    • pp.65-76
    • /
    • 2017
  • The purpose of this study was to present a prediction model that reflects crime risk area analysis, including factors and spatial characteristics, as a precursor to preparing an alternative plan for crime prevention and design. This analysis of criminal cases in high-risk areas revealed clusters in which approximately 25% of the cases within the study area occurred, distributed evenly throughout the region. This means that using a multiple linear regression model might overestimate the crime rate in some regions and underestimate in others. It also suggests that the number of deserted houses in an analyzed region has a negative relationship with the dependent variable, based on the multiple linear regression model results, and can also have different influences depending on the region. These results reveal that closure signs in a study area affect the dependent variable differently, depending on the region, rather than a simple or direct relationship with the dependent variable, as indicated by the results of the multiple linear regression model.

Development of Traffic Accidents Prediction Model With Fuzzy and Neural Network Theory (퍼지 및 신경망 이론을 이용한 교통사고예측모형 개발에 관한 연구)

  • Kim, Jang-Uk;Nam, Gung-Mun;Kim, Jeong-Hyeon;Lee, Su-Beom
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.7 s.93
    • /
    • pp.81-90
    • /
    • 2006
  • It is important to clarify the relationship between traffic accidents and various influencing factors in order to reduce the number of traffic accidents. This study developed a traffic accident frequency prediction model using by multi-linear regression and qualification theories which are commonly applied in the field of traffic safety to verify the influences of various factors into the traffic accident frequency The data were collected on the Korean National Highway 17 which shows the highest accident frequencies and fatality rates in Chonbuk province. In order to minimize the uncertainty of the data, the fuzzy theory and neural network theory were applied. The neural network theory can provide fair learning performance by modeling the human neural system mathematically. Tn conclusion, this study focused on the practicability of the fuzzy reasoning theory and the neural network theory for traffic safety analysis.