• Title/Summary/Keyword: 앙상블 기계학습

Search Result 78, Processing Time 0.025 seconds

Oversampling-Based Ensemble Learning Methods for Imbalanced Data (불균형 데이터 처리를 위한 과표본화 기반 앙상블 학습 기법)

  • Kim, Kyung-Min;Jang, Ha-Young;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.10
    • /
    • pp.549-554
    • /
    • 2014
  • Handwritten character recognition data is usually imbalanced because it is collected from the natural language sentences written by different writers. The imbalanced data can cause seriously negative effect on the performance of most of machine learning algorithms. But this problem is typically ignored in handwritten character recognition, because it is considered that most of difficulties in handwritten character recognition is caused by the high variance in data set and similar shapes between characters. We propose the oversampling-based ensemble learning methods to solve imbalanced data problem in handwritten character recognition and to improve the recognition accuracy. Also we show that proposed method achieved improvements in recognition accuracy of minor classes as well as overall recognition accuracy empirically.

Deep Learning-Based Brain Tumor Classification in MRI images using Ensemble of Deep Features

  • Kang, Jaeyong;Gwak, Jeonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.7
    • /
    • pp.37-44
    • /
    • 2021
  • Automatic classification of brain MRI images play an important role in early diagnosis of brain tumors. In this work, we present a deep learning-based brain tumor classification model in MRI images using ensemble of deep features. In our proposed framework, three different deep features from brain MR image are extracted using three different pre-trained models. After that, the extracted deep features are fed to the classification module. In the classification module, the three different deep features are first fed into the fully-connected layers individually to reduce the dimension of the features. After that, the output features from the fully-connected layers are concatenated and fed into the fully-connected layer to predict the final output. To evaluate our proposed model, we use openly accessible brain MRI dataset from web. Experimental results show that our proposed model outperforms other machine learning-based models.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.

Place Recognition Using Ensemble Learning of Mobile Multimodal Sensory Information (모바일 멀티모달 센서 정보의 앙상블 학습을 이용한 장소 인식)

  • Lee, Chung-Yeon;Lee, Beom-Jin;On, Kyoung-Woon;Ha, Jung-Woo;Kim, Hong-Il;Zhang, Byoung-Tak
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.1
    • /
    • pp.64-69
    • /
    • 2015
  • Place awareness is an essential for location-based services that are widely provided to smartphone users. However, traditional GPS-based methods are only valid outdoors where the GPS signal is strong and also require symbolic place information of the physical location. In this paper, environmental sounds and images are used to recognize important aspects of each place. The proposed method extracts feature vectors from visual, auditory and location data recorded by a smartphone with built-in camera, microphone and GPS sensors modules. The heterogeneous feature vectors were then learned by an ensemble learning method that learns each group of feature vectors for each classifier respectively and votes to produce the highest weighted result. The proposed method is evaluated for place recognition using a data group of 3000 samples in six places and the experimental results show a remarkably improved recognition accuracy when using all kinds of sensory data comparing to results using data from a single sensor or audio-visual integrated data only.

Comparison of Seismic Data Interpolation Performance using U-Net and cWGAN (U-Net과 cWGAN을 이용한 탄성파 탐사 자료 보간 성능 평가)

  • Yu, Jiyun;Yoon, Daeung
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.3
    • /
    • pp.140-161
    • /
    • 2022
  • Seismic data with missing traces are often obtained regularly or irregularly due to environmental and economic constraints in their acquisition. Accordingly, seismic data interpolation is an essential step in seismic data processing. Recently, research activity on machine learning-based seismic data interpolation has been flourishing. In particular, convolutional neural network (CNN) and generative adversarial network (GAN), which are widely used algorithms for super-resolution problem solving in the image processing field, are also used for seismic data interpolation. In this study, CNN-based algorithm, U-Net and GAN-based algorithm, and conditional Wasserstein GAN (cWGAN) were used as seismic data interpolation methods. The results and performances of the methods were evaluated thoroughly to find an optimal interpolation method, which reconstructs with high accuracy missing seismic data. The work process for model training and performance evaluation was divided into two cases (i.e., Cases I and II). In Case I, we trained the model using only the regularly sampled data with 50% missing traces. We evaluated the model performance by applying the trained model to a total of six different test datasets, which consisted of a combination of regular, irregular, and sampling ratios. In Case II, six different models were generated using the training datasets sampled in the same way as the six test datasets. The models were applied to the same test datasets used in Case I to compare the results. We found that cWGAN showed better prediction performance than U-Net with higher PSNR and SSIM. However, cWGAN generated additional noise to the prediction results; thus, an ensemble technique was performed to remove the noise and improve the accuracy. The cWGAN ensemble model removed successfully the noise and showed improved PSNR and SSIM compared with existing individual models.

Modeling and Selecting Optimal Features for Machine Learning Based Detections of Android Malwares (머신러닝 기반 안드로이드 모바일 악성 앱의 최적 특징점 선정 및 모델링 방안 제안)

  • Lee, Kye Woong;Oh, Seung Taek;Yoon, Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.11
    • /
    • pp.427-432
    • /
    • 2019
  • In this paper, we propose three approaches to modeling Android malware. The first method involves human security experts for meticulously selecting feature sets. With the second approach, we choose 300 features with the highest importance among the top 99% features in terms of occurrence rate. The third approach is to combine multiple models and identify malware through weighted voting. In addition, we applied a novel method of eliminating permission information which used to be regarded as a critical factor for distinguishing malware. With our carefully generated feature sets and the weighted voting by the ensemble algorithm, we were able to reach the highest malware detection accuracy of 97.8%. We also verified that discarding the permission information lead to the improvement in terms of false positive and false negative rates.

Long-term runoff simulation using rainfall LSTM-MLP artificial neural network ensemble (LSTM - MLP 인공신경망 앙상블을 이용한 장기 강우유출모의)

  • An, Sungwook;Kang, Dongho;Sung, Janghyun;Kim, Byungsik
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.127-137
    • /
    • 2024
  • Physical models, which are often used for water resource management, are difficult to build and operate with input data and may involve the subjective views of users. In recent years, research using data-driven models such as machine learning has been actively conducted to compensate for these problems in the field of water resources, and in this study, an artificial neural network was used to simulate long-term rainfall runoff in the Osipcheon watershed in Samcheok-si, Gangwon-do. For this purpose, three input data groups (meteorological observations, daily precipitation and potential evapotranspiration, and daily precipitation - potential evapotranspiration) were constructed from meteorological data, and the results of training the LSTM (Long Short-term Memory) artificial neural network model were compared and analyzed. As a result, the performance of LSTM-Model 1 using only meteorological observations was the highest, and six LSTM-MLP ensemble models with MLP artificial neural networks were built to simulate long-term runoff in the Fifty Thousand Watershed. The comparison between the LSTM and LSTM-MLP models showed that both models had generally similar results, but the MAE, MSE, and RMSE of LSTM-MLP were reduced compared to LSTM, especially in the low-flow part. As the results of LSTM-MLP show an improvement in the low-flow part, it is judged that in the future, in addition to the LSTM-MLP model, various ensemble models such as CNN can be used to build physical models and create sulfur curves in large basins that take a long time to run and unmeasured basins that lack input data.

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

Indoor positioning method using WiFi signal based on XGboost (XGboost 기반의 WiFi 신호를 이용한 실내 측위 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo;Kim, Dae-Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.70-75
    • /
    • 2022
  • Accurately measuring location is necessary to provide a variety of services. The data for indoor positioning measures the RSSI values from the WiFi device through an application of a smartphone. The measured data becomes the raw data of machine learning. The feature data is the measured RSSI value, and the label is the name of the space for the measured position. For this purpose, the machine learning technique is to study a technique that predicts the exact location only with the WiFi signal by applying an efficient technique to classification. Ensemble is a technique for obtaining more accurate predictions through various models than one model, including backing and boosting. Among them, Boosting is a technique for adjusting the weight of a model through a modeling result based on sampled data, and there are various algorithms. This study uses Xgboost among the above techniques and evaluates performance with other ensemble techniques.

Machine Learning Based Capacity Prediction Model of Terminal Maneuvering Area (기계학습 기반 접근관제구역 수용량 예측 모형)

  • Han, Sanghyok;Yun, Taegyeong;Kim, Sang Hyun
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.50 no.3
    • /
    • pp.215-222
    • /
    • 2022
  • The purpose of air traffic flow management is to balance demand and capacity in the national airspace, and its performance relies on an accurate capacity prediction of the airport or airspace. This paper developed a regression model that predicts the number of aircraft actually departing and arriving in a terminal maneuvering area. The regression model is based on a boosting ensemble learning algorithm that learns past aircraft operational data such as time, weather, scheduled demand, and unfulfilled demand at a specific airport in the terminal maneuvering area. The developed model was tested using historical departure and arrival flight data at Incheon International Airport, and the coefficient of determination is greater than 0.95. Also, the capacity of the terminal maneuvering area of interest is implicitly predicted by using the model.