• 제목/요약/키워드: Regression tree algorithm

검색결과 118건 처리시간 0.027초

6차산업 경영체 성장단계별 핵심경영요소 탐색 (Exploring the Management Component of Rural Small Business in the 6th Industry at Each Stage of Growth)

  • 김정태
    • 벤처창업연구
    • /
    • 제12권6호
    • /
    • pp.123-138
    • /
    • 2017
  • 본 연구는 6차산업 경영체의 성장단계별 유형특징을 살펴보고, 각 단계별 경영전략의 핵심요소를 찾는데 목적을 두었다. 2015년 6차산업으로 인증된 752개 경영체 자료를 의사결정나무구조 분석의 CART 알고리즘으로 분석하였다. 분석결과 6차산업 유형결정에 초기 성장단계에서는 농산물가공유형, 성장기에는 농산물가공유형, 서비스유형, 지역, 매출액이 작용하고, 성숙기에는 서비스전략, 농산물가공유형이 작용하였다. 이 같은 결과는 6차산업 경영체의 성장단계별 지원될 핵심적 경영요소를 실증적으로 규명하여, 6차산업 지원방향을 제시하고 있다.

  • PDF

정수장 전염소 공정제어를 위한 침전지 잔류염소농도 예측 머신러닝 모형 (Machine learning model for residual chlorine prediction in sediment basin to control pre-chlorination in water treatment plant)

  • 김주환;이경혁;김수전;김경훈
    • 한국수자원학회논문집
    • /
    • 제55권spc1호
    • /
    • pp.1283-1293
    • /
    • 2022
  • 본 연구는 정수장의 수처리 공정에서 계측되고 있는 수량 및 수질데이터의 활용과 수처리 공정제어의 지능화를 위한 것으로 정수장에서 전염소 공정이 수반되는 처리공정에서 침전지 유출수 잔류염소농도 안정화를 위하여 이를 추정할 수 있는 모형을 구축하고자 하였다. 정수장 침전지 유출수의 잔류염소농도를 예측하기 위하여 중회귀모형과 인공지능 알고리즘 중 다층퍼셉트론 신경망, 랜덤포레스트 및 장단기기억(Long Short Term Memory; LSTM) 모형을 활용하였고 그 결과를 비교, 평가하였다. 모형의 입력변수로는 전염소 공정이 도입된 정수장에서의 잔류염소농도, 수온, 탁도, pH, 전기전도도, 유량, 알칼리도 등이 사용되었고 전염소에 따른 침전지의 안정적 운영을 위해 요구되는 침전지 잔류염소농도를 출력변수로 구성하였다. 적용 결과에서는 랜덤포레스트 모형이 가장 양호한 결과를 보여 주었으며 다음으로 LSTM, 다층퍼셈트론 신경망 순으로 나타났다. 수학적 모형인 중회귀모형은 적합도 측면에서 가장 낮은 결과를 보여 주었는데, 이는 수량과 수질데이터의 수치적인 규모나 차원의 차이뿐만 아니라 계절별 수질특성에 따라 염소소비 특성이 매우 다양하게 반응하기 때문으로 판단된다. 따라서 정수장 수처리 공정에서 인공지능 알고리즘의 적용을 위해서는 랜덤포레스트와 같이 의사결정 트리구조의 도입과 적용이 타당한 것으로 나타났다. 본 연구에서 분석된 결과를 근거로 전염소 공정이 도입된 정수장 수처리 공정에서 염소주입량을 실시간으로 예측 가능하게 함으로써 침전지 유출수에서 잔류염소농도를 일정하게 유지하는데 기여할 수 있을 것으로 기대된다.

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • 대한원격탐사학회지
    • /
    • 제34권1호
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

군집 알고리즘을 이용한 순차적 이상치 탐지법 (A sequential outlier detecting method using a clustering algorithm)

  • 서한손;윤민
    • 응용통계연구
    • /
    • 제29권4호
    • /
    • pp.699-706
    • /
    • 2016
  • 검정절차가 생략된 이상치 탐지법은 구조적으로 수렁효과나 가면효과에 취약하기 때문에 다수의 이상치를 제대로 탐지하지 못할 때가 있다. 본 연구에서는 군집화에 의하여 구분된 소수 관찰치군을 이상치로 판정하는 방법에 보완될 검정절차를 다룬다. 이에 관련된 일반적인 방법은 탐지된 이상치 후보군의 개별적인 관찰치에 대해 다양한 종류의 t-검정을 수행하는 것이다. 본 연구에서는 이상치 후보군에 대한 검정을 수행하고 군집나무의 절단기준을 변경시켜 새로운 이상치군을 탐색해 나가는 순차적인 방법을 제안한다. 예제와 모의실험을 통해 제시된 방법과 기존의 방법들을 비교한다.

조선분야의 축적된 데이터 활용을 위한 유전적프로그래밍에서의 선형(Linear) 모델 개발 (Implementing Linear Models in Genetic Programming to Utilize Accumulated Data in Shipbuilding)

  • 이경호;연윤석;양영순
    • 대한조선학회논문집
    • /
    • 제42권5호
    • /
    • pp.534-541
    • /
    • 2005
  • Until now, Korean shipyards have accumulated a great amount of data. But they do not have appropriate tools to utilize the data in practical works. Engineering data contains experts' experience and know-how in its own. It is very useful to extract knowledge or information from the accumulated existing data by using data mining technique This paper treats an evolutionary computation based on genetic programming (GP), which can be one of the components to realize data mining. The paper deals with linear models of GP for the regression or approximation problem when given learning samples are not sufficient. The linear model, which is a function of unknown parameters, is built through extracting all possible base functions from the standard GP tree by utilizing the symbolic processing algorithm. In addition to a standard linear model consisting of mathematic functions, one variant form of a linear model, which can be built using low order Taylor series and can be converted into the standard form of a polynomial, is considered in this paper. The suggested model can be utilized as a designing tool to predict design parameters with small accumulated data.

Digital mapping of soil carbon stock in Jeolla province using cubist model

  • Park, Seong-Jin;Lee, Chul-Woo;Kim, Seong-Heon;Oh, Taek-Keun
    • 농업과학연구
    • /
    • 제47권4호
    • /
    • pp.1097-1107
    • /
    • 2020
  • Assessment of soil carbon stock is essential for climate change mitigation and soil fertility. The digital soil mapping (DSM) is well known as a general technique to estimate the soil carbon stocks and upgrade previous soil maps. The aim of this study is to calculate the soil carbon stock in the top soil layer (0 to 30 cm) in Jeolla Province of South Korea using the DSM technique. To predict spatial carbon stock, we used Cubist, which a data-mining algorithm model base on tree regression. Soil samples (130 in total) were collected from three depths (0 to 10 cm, 10 to 20 cm, 20 to 30 cm) considering spatial distribution in Jeolla Province. These data were randomly divided into two sets for model calibration (70%) and validation (30%). The results showed that clay content, topographic wetness index (TWI), and digital elevation model (DEM) were the most important environmental covariate predictors of soil carbon stock. The predicted average soil carbon density was 3.88 kg·m-2. The R2 value representing the model's performance was 0.6, which was relatively high compared to a previous study. The total soil carbon stocks at a depth of 0 to 30 cm in Jeolla Province were estimated to be about 81 megatons.

Estimation of various amounts of kaolinite on concrete alkali-silica reactions using different machine learning methods

  • Aflatoonian, Moein;Mirhosseini, Ramin Tabatabaei
    • Structural Engineering and Mechanics
    • /
    • 제83권1호
    • /
    • pp.79-92
    • /
    • 2022
  • In this paper, the impact of a vernacular pozzolanic kaolinite mine on concrete alkali-silica reaction and strength has been evaluated. For making the samples, kaolinite powder with various levels has been used in the quality specification test of aggregates based on the ASTM C1260 standard in order to investigate the effect of kaolinite particles on reducing the reaction of the mortar bars. The compressive strength, X-Ray Diffraction (XRD) and Scanning Electron Microscope (SEM) experiments have been performed on concrete specimens. The obtained results show that addition of kaolinite powder to concrete will cause a pozzolanic reaction and decrease the permeability of concrete samples comparing to the reference concrete specimen. Further, various machine learning methods have been used to predict ASR-induced expansion per different amounts of kaolinite. In the process of modeling methods, optimal method is considered to have the lowest mean square error (MSE) simultaneous to having the highest correlation coefficient (R). Therefore, to evaluate the efficiency of the proposed model, the results of the support vector machine (SVM) method were compared with the decision tree method, regression analysis and neural network algorithm. The results of comparison of forecasting tools showed that support vector machines have outperformed the results of other methods. Therefore, the support vector machine method can be mentioned as an effective approach to predict ASR-induced expansion.

Differentiation among stability regimes of alumina-water nanofluids using smart classifiers

  • Daryayehsalameh, Bahador;Ayari, Mohamed Arselene;Tounsi, Abdelouahed;Khandakar, Amith;Vaferi, Behzad
    • Advances in nano research
    • /
    • 제12권5호
    • /
    • pp.489-499
    • /
    • 2022
  • Nanofluids have recently triggered a substantial scientific interest as cooling media. However, their stability is challenging for successful engagement in industrial applications. Different factors, including temperature, nanoparticles and base fluids characteristics, pH, ultrasonic power and frequency, agitation time, and surfactant type and concentration, determine the nanofluid stability regime. Indeed, it is often too complicated and even impossible to accurately find the conditions resulting in a stabilized nanofluid. Furthermore, there are no empirical, semi-empirical, and even intelligent scenarios for anticipating the stability of nanofluids. Therefore, this study introduces a straightforward and reliable intelligent classifier for discriminating among the stability regimes of alumina-water nanofluids based on the Zeta potential margins. In this regard, various intelligent classifiers (i.e., deep learning and multilayer perceptron neural network, decision tree, GoogleNet, and multi-output least squares support vector regression) have been designed, and their classification accuracy was compared. This comparison approved that the multilayer perceptron neural network (MLPNN) with the SoftMax activation function trained by the Bayesian regularization algorithm is the best classifier for the considered task. This intelligent classifier accurately detects the stability regimes of more than 90% of 345 different nanofluid samples. The overall classification accuracy and misclassification percent of 90.1% and 9.9% have been achieved by this model. This research is the first try toward anticipting the stability of water-alumin nanofluids from some easily measured independent variables.

Predicting Reports of Theft in Businesses via Machine Learning

  • JungIn, Seo;JeongHyeon, Chang
    • International Journal of Advanced Culture Technology
    • /
    • 제10권4호
    • /
    • pp.499-510
    • /
    • 2022
  • This study examines the reporting factors of crime against business in Korea and proposes a corresponding predictive model using machine learning. While many previous studies focused on the individual factors of theft victims, there is a lack of evidence on the reporting factors of crime against a business that serves the public good as opposed to those that protect private property. Therefore, we proposed a crime prevention model for the willingness factor of theft reporting in businesses. This study used data collected through the 2015 Commercial Crime Damage Survey conducted by the Korea Institute for Criminal Policy. It analyzed data from 834 businesses that had experienced theft during a 2016 crime investigation. The data showed a problem with unbalanced classes. To solve this problem, we jointly applied the Synthetic Minority Over Sampling Technique and the Tomek link techniques to the training data. Two prediction models were implemented. One was a statistical model using logistic regression and elastic net. The other involved a support vector machine model, tree-based machine learning models (e.g., random forest, extreme gradient boosting), and a stacking model. As a result, the features of theft price, invasion, and remedy, which are known to have significant effects on reporting theft offences, can be predicted as determinants of such offences in companies. Finally, we verified and compared the proposed predictive models using several popular metrics. Based on our evaluation of the importance of the features used in each model, we suggest a more accurate criterion for predicting var.

LID-DS 데이터 세트를 사용한 기계학습 알고리즘 비교 연구 (A Comparative Study of Machine Learning Algorithms Using LID-DS DataSet)

  • 박대경;류경준;신동일;신동규;박정찬;김진국
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권3호
    • /
    • pp.91-98
    • /
    • 2021
  • 오늘날 정보통신 기술이 급격하게 발달하면서 IT 인프라에서 보안의 중요성이 높아졌고 동시에 지능형 지속 공격(Advanced Persistent Threat)처럼 고도화되고 다양한 형태의 사이버 공격이 증가하고 있다. 점점 더 고도화되는 사이버 공격을 조기에 방어하거나 예측하는 것은 매우 중요한 사안으로, NIDS(Network-based Intrusion Detection System) 관련 데이터 분석만으로는 빠르게 변형하는 사이버 공격을 방어하지 못하는 경우가 많이 보고되고 있다. 따라서 현재는 HIDS(Host-based Intrusion Detection System) 데이터 분석을 통해서 위와 같은 사이버 공격을 방어하는데 침입 탐지 시스템에서 생성된 데이터를 이용하고 있다. 본 논문에서는 기존에 사용되었던 데이터 세트에서 결여된 스레드 정보, 메타 데이터 및 버퍼 데이터를 포함한 LID-DS(Leipzig Intrusion Detection-Data Set) 호스트 기반 침입 탐지 데이터를 이용하여 기계학습 알고리즘에 관한 비교 연구를 진행했다. 사용한 알고리즘은 Decision Tree, Naive Bayes, MLP(Multi-Layer Perceptron), Logistic Regression, LSTM(Long Short-Term Memory model), RNN(Recurrent Neural Network)을 사용했다. 평가를 위해 Accuracy, Precision, Recall, F1-Score 지표와 오류율을 측정했다. 그 결과 LSTM 알고리즘의 정확성이 가장 높았다.