• Title/Summary/Keyword: tree based learning

Search Result 429, Processing Time 0.036 seconds

A Case Study of Geometry Teaching and Learning based on Waldorf Education Methods in a Korean Alternative School (발도르프 수학교육 방법을 적용한 우리나라 대안학교 기하단원 교수·학습에 관한 사례연구)

  • Song, Man Ho;Kim, Young-Ok
    • East Asian mathematical journal
    • /
    • v.30 no.2
    • /
    • pp.197-222
    • /
    • 2014
  • The purpose of this research is to find out if it is possible to apply the Waldorf School's mathematics education method to Korean alternative schools which are run under the national curriculum. To achieve this, the researcher conducted class on geometry for three weeks with ten 7th graders(four girls and six boys) from Apple Tree Waldorf alternative school in Busan, which has adopted Valdorf education courses. For the first two weeks, the class was about 'fundamental geometrical construction', and then it was evaluated. On the third week, the lesson was on plane figures, followed by a test with 9 plane figure questions that are based on general middle school mathematics curriculum. The result shows that most of the students understood 'fundamental geometrical construction'. When it comes to the test on 'plane figures', seven students got 8 out of 9 right, two students got 6 out of 9 right, and one of them had difficulty solving the questions. According to the results of this research, it is thought that there will be no problem for students to understand mathematical concept even if the Waldorf School's mathematics education method is applied to Korean alternative schools. Also, the Waldorf School's mathematics education method is considered to be a good teaching model for the Korean mathematics curriculum which places emphasis on 'mathematical creativity' in regard to the curriculum and contents.

A new classification method using penalized partial least squares (벌점 부분최소자승법을 이용한 분류방법)

  • Kim, Yun-Dae;Jun, Chi-Hyuck;Lee, Hye-Seon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.931-940
    • /
    • 2011
  • Classification is to generate a rule of classifying objects into several categories based on the learning sample. Good classification model should classify new objects with low misclassification error. Many types of classification methods have been developed including logistic regression, discriminant analysis and tree. This paper presents a new classification method using penalized partial least squares. Penalized partial least squares can make the model more robust and remedy multicollinearity problem. This paper compares the proposed method with logistic regression and PCA based discriminant analysis by some real and artificial data. It is concluded that the new method has better power as compared with other methods.

An Assessment of a Random Forest Classifier for a Crop Classification Using Airborne Hyperspectral Imagery

  • Jeon, Woohyun;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.141-150
    • /
    • 2018
  • Crop type classification is essential for supporting agricultural decisions and resource monitoring. Remote sensing techniques, especially using hyperspectral imagery, have been effective in agricultural applications. Hyperspectral imagery acquires contiguous and narrow spectral bands in a wide range. However, large dimensionality results in unreliable estimates of classifiers and high computational burdens. Therefore, reducing the dimensionality of hyperspectral imagery is necessary. In this study, the Random Forest (RF) classifier was utilized for dimensionality reduction as well as classification purpose. RF is an ensemble-learning algorithm created based on the Classification and Regression Tree (CART), which has gained attention due to its high classification accuracy and fast processing speed. The RF performance for crop classification with airborne hyperspectral imagery was assessed. The study area was the cultivated area in Chogye-myeon, Habcheon-gun, Gyeongsangnam-do, South Korea, where the main crops are garlic, onion, and wheat. Parameter optimization was conducted to maximize the classification accuracy. Then, the dimensionality reduction was conducted based on RF variable importance. The result shows that using the selected bands presents an excellent classification accuracy without using whole datasets. Moreover, a majority of selected bands are concentrated on visible (VIS) region, especially region related to chlorophyll content. Therefore, it can be inferred that the phenological status after the mature stage influences red-edge spectral reflectance.

Comparison of machine learning algorithms to evaluate strength of concrete with marble powder

  • Sharma, Nitisha;Upadhya, Ankita;Thakur, Mohindra S.;Sihag, Parveen
    • Advances in materials Research
    • /
    • v.11 no.1
    • /
    • pp.75-90
    • /
    • 2022
  • In this paper, functionality of soft computing algorithms such as Group method of data handling (GMDH), Random forest (RF), Random tree (RT), Linear regression (LR), M5P, and artificial neural network (ANN) have been looked out to predict the compressive strength of concrete mixed with marble powder. Assessment of result suggests that, the overall performance of ANN based model gives preferable results over the different applied algorithms for the estimate of compressive strength of concrete. The results of coefficient of correlation were maximum in ANN model (0.9139) accompanied through RT with coefficient of correlation (CC) value 0.8241 and minimum root mean square error (RMSE) value of ANN (4.5611) followed by RT with RMSE (5.4246). Similarly, other evaluating parameters like, Willmott's index and Nash-sutcliffe coefficient value of ANN was 0.9458 and 0.7502 followed by RT model (0.8763 and 0.6628). The end result showed that, for both subsets i.e., training and testing subset, ANN has the potential to estimate the compressive strength of concrete. Also, the results of sensitivity suggest that the water-cement ratio has a massive impact in estimating the compressive strength of concrete with marble powder with ANN based model in evaluation with the different parameters for this data set.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

Development and Testing of a Machine Learning Model Using 18F-Fluorodeoxyglucose PET/CT-Derived Metabolic Parameters to Classify Human Papillomavirus Status in Oropharyngeal Squamous Carcinoma

  • Changsoo Woo;Kwan Hyeong Jo;Beomseok Sohn;Kisung Park;Hojin Cho;Won Jun Kang;Jinna Kim;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.24 no.1
    • /
    • pp.51-61
    • /
    • 2023
  • Objective: To develop and test a machine learning model for classifying human papillomavirus (HPV) status of patients with oropharyngeal squamous cell carcinoma (OPSCC) using 18F-fluorodeoxyglucose (18F-FDG) PET-derived parameters in derived parameters and an appropriate combination of machine learning methods in patients with OPSCC. Materials and Methods: This retrospective study enrolled 126 patients (118 male; mean age, 60 years) with newly diagnosed, pathologically confirmed OPSCC, that underwent 18F-FDG PET-computed tomography (CT) between January 2012 and February 2020. Patients were randomly assigned to training and internal validation sets in a 7:3 ratio. An external test set of 19 patients (16 male; mean age, 65.3 years) was recruited sequentially from two other tertiary hospitals. Model 1 used only PET parameters, Model 2 used only clinical features, and Model 3 used both PET and clinical parameters. Multiple feature transforms, feature selection, oversampling, and training models are all investigated. The external test set was used to test the three models that performed best in the internal validation set. The values for area under the receiver operating characteristic curve (AUC) were compared between models. Results: In the external test set, ExtraTrees-based Model 3, which uses two PET-derived parameters and three clinical features, with a combination of MinMaxScaler, mutual information selection, and adaptive synthetic sampling approach, showed the best performance (AUC = 0.78; 95% confidence interval, 0.46-1). Model 3 outperformed Model 1 using PET parameters alone (AUC = 0.48, p = 0.047) and Model 2 using clinical parameters alone (AUC = 0.52, p = 0.142) in predicting HPV status. Conclusion: Using oversampling and mutual information selection, an ExtraTree-based HPV status classifier was developed by combining metabolic parameters derived from 18F-FDG PET/CT and clinical parameters in OPSCC, which exhibited higher performance than the models using either PET or clinical parameters alone.

Sleep Deprivation Attack Detection Based on Clustering in Wireless Sensor Network (무선 센서 네트워크에서 클러스터링 기반 Sleep Deprivation Attack 탐지 모델)

  • Kim, Suk-young;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.1
    • /
    • pp.83-97
    • /
    • 2021
  • Wireless sensors that make up the Wireless Sensor Network generally have extremely limited power and resources. The wireless sensor enters the sleep state at a certain interval to conserve power. The Sleep deflation attack is a deadly attack that consumes power by preventing wireless sensors from entering the sleep state, but there is no clear countermeasure. Thus, in this paper, using clustering-based binary search tree structure, the Sleep deprivation attack detection model is proposed. The model proposed in this paper utilizes one of the characteristics of both attack sensor nodes and normal sensor nodes which were classified using machine learning. The characteristics used for detection were determined using Long Short-Term Memory, Decision Tree, Support Vector Machine, and K-Nearest Neighbor. Thresholds for judging attack sensor nodes were then learned by applying the SVM. The determined features were used in the proposed algorithm to calculate the values for attack detection, and the threshold for determining the calculated values was derived by applying SVM.Through experiments, the detection model proposed showed a detection rate of 94% when 35% of the total sensor nodes were attack sensor nodes and improvement of up to 26% in power retention.