• 제목/요약/키워드: Model Ensemble

검색결과 638건 처리시간 0.021초

Predicting movie audience with stacked generalization by combining machine learning algorithms

  • Park, Junghoon;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권3호
    • /
    • pp.217-232
    • /
    • 2021
  • The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.

Ensemble variable selection using genetic algorithm

  • Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.629-640
    • /
    • 2022
  • Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.

Genetic classification of various familial relationships using the stacking ensemble machine learning approaches

  • Su Jin Jeong;Hyo-Jung Lee;Soong Deok Lee;Ji Eun Park;Jae Won Lee
    • Communications for Statistical Applications and Methods
    • /
    • 제31권3호
    • /
    • pp.279-289
    • /
    • 2024
  • Familial searching is a useful technique in a forensic investigation. Using genetic information, it is possible to identify individuals, determine familial relationships, and obtain racial/ethnic information. The total number of shared alleles (TNSA) and likelihood ratio (LR) methods have traditionally been used, and novel data-mining classification methods have recently been applied here as well. However, it is difficult to apply these methods to identify familial relationships above the third degree (e.g., uncle-nephew and first cousins). Therefore, we propose to apply a stacking ensemble machine learning algorithm to improve the accuracy of familial relationship identification. Using real data analysis, we obtain superior relationship identification results when applying meta-classifiers with a stacking algorithm rather than applying traditional TNSA or LR methods and data mining techniques.

컨셉 변동 스트리밍 데이터를 위한 적응적 가중치 조정을 이용한 동적 앙상블 방법 (A Dynamic Ensemble Method using Adaptive Weight Adjustment for Concept Drifting Streaming Data)

  • 김영덕;박정희
    • 정보과학회 논문지
    • /
    • 제44권8호
    • /
    • pp.842-853
    • /
    • 2017
  • 스트리밍 데이터는 시간에 따라 지속적으로 생성되는 데이터 시퀀스이다. 시간이 지남에 따라 데이터의 분포 또는 컨셉이 변화할 수 있으며, 이러한 변화는 분류 모델의 성능을 저하시키는 요인이 된다. 점층적 적응적 학습 방법은 컨셉 변화의 정도에 따라 현재 분류 모델의 가중치를 조절하여 업데이트를 수행함으로써 컨셉 변화에 대한 분류 모델의 성능을 유지할 수 있게 한다. 그러나, 컨셉 변화의 정도에 맞는 적절한 가중치를 결정하기가 어렵다는 문제점이 있다. 본 논문에서는 컨셉 변화에 따른 적응적 가중치 조정에 기반한 동적 앙상블 방법을 제안한다. 실험 결과는 제안한 방법이 다른 비교 방법들에 비해 높은 성능을 보여줌을 입증한다.

Context-aware Video Surveillance System

  • An, Tae-Ki;Kim, Moon-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • 제7권1호
    • /
    • pp.115-123
    • /
    • 2012
  • A video analysis system used to detect events in video streams generally has several processes, including object detection, object trajectories analysis, and recognition of the trajectories by comparison with an a priori trained model. However, these processes do not work well in a complex environment that has many occlusions, mirror effects, and/or shadow effects. We propose a new approach to a context-aware video surveillance system to detect predefined contexts in video streams. The proposed system consists of two modules: a feature extractor and a context recognizer. The feature extractor calculates the moving energy that represents the amount of moving objects in a video stream and the stationary energy that represents the amount of still objects in a video stream. We represent situations and events as motion changes and stationary energy in video streams. The context recognizer determines whether predefined contexts are included in video streams using the extracted moving and stationary energies from a feature extractor. To train each context model and recognize predefined contexts in video streams, we propose and use a new ensemble classifier based on the AdaBoost algorithm, DAdaBoost, which is one of the most famous ensemble classifier algorithms. Our proposed approach is expected to be a robust method in more complex environments that have a mirror effect and/or a shadow effect.

대량 데이터를 위한 제한거절 기반의 회귀부스팅 기법 (Boosted Regression Method based on Rejection Limits for Large-Scale Data)

  • 권혁호;김승욱;최동훈;이기천
    • 대한산업공학회지
    • /
    • 제42권4호
    • /
    • pp.263-269
    • /
    • 2016
  • The purpose of this study is to challenge a computational regression-type problem, that is handling large-size data, in which conventional metamodeling techniques often fail in a practical sense. To solve such problems, regression-type boosting, one of ensemble model techniques, together with bootstrapping-based re-sampling is a reasonable choice. This study suggests weight updates by the amount of the residual itself and a new error decision criterion which constructs an ensemble model of models selectively chosen by rejection limits. Through these ideas, we propose AdaBoost.RMU.R as a metamodeling technique suitable for handling large-size data. To assess the performance of the proposed method in comparison to some existing methods, we used 6 mathematical problems. For each problem, we computed the average and the standard deviation of residuals between real response values and predicted response values. Results revealed that the average and the standard deviation of AdaBoost.RMU.R were improved than those of other algorithms.

칼만필터의 자료동화 활용을 위한 배경오차 공분산의 명시적 시간 진전 제거 (An Affordable Implementation of Kalman Filter by Eliminating the Explicit Temporal Evolution of the Background Error Covariance Matrix)

  • 임규호;서애숙;하지현
    • 대기
    • /
    • 제23권1호
    • /
    • pp.33-37
    • /
    • 2013
  • In meteorology, exploitation of Kalman filter as a data assimilation system is virtually impossible due to simultaneous requirements of adjoint model and large computer resource. The other substitute of utilizing ensemble Kalman filter is only affordable by compensating an enormous usage of computing resource. Furthermore, the latter employs ensemble integration sets for evolving the background error covariance matrix by compensating the dynamical feature of the temporal evolution of weather conditions. We propose a new implementation method that works without the adjoint model by utilizing the explicit expression of the background error covariance matrix in backward evolution. It will also break a barrier in the evolution of the covariance matrix. The method may be applied with a slight modification to the real time assimilation or the retrospective analysis.

An investigation of the structure of ensemble averaged extreme wind events

  • Scarabino, A.;Sterling, M.;Richards, P.J.;Baker, C.J.;Hoxey, R.P.
    • Wind and Structures
    • /
    • 제10권2호
    • /
    • pp.135-151
    • /
    • 2007
  • This paper examines the extreme gust profiles obtained by conditionally sampling full-scale velocity data obtained in the lower part of the atmospheric boundary layer. It is demonstrated that three different types of behaviour can be observed in the streamwise component of velocity. In all cases the corresponding vertical velocity component illustrates similar behaviour. An idealised horseshoe vortex model and a downburst model are investigated to examine if such structures can explain the behaviour observed. In addition, an empirical model is developed for an isolated gust corresponding to each of the three types of behaviour observed. It is possible that the division of the gust profile into three different types may lead to an improvement in the correlation of extreme gust events with respect to type.

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • 제20권1호
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

Taxi-demand forecasting using dynamic spatiotemporal analysis

  • Gangrade, Akshata;Pratyush, Pawel;Hajela, Gaurav
    • ETRI Journal
    • /
    • 제44권4호
    • /
    • pp.624-640
    • /
    • 2022
  • Taxi-demand forecasting and hotspot prediction can be critical in reducing response times and designing a cost effective online taxi-booking model. Taxi demand in a region can be predicted by considering the past demand accumulated in that region over a span of time. However, other covariates-like neighborhood influence, sociodemographic parameters, and point-of-interest data-may also influence the spatiotemporal variation of demand. To study the effects of these covariates, in this paper, we propose three models that consider different covariates in order to select a set of independent variables. These models predict taxi demand in spatial units for a given temporal resolution using linear and ensemble regression. We eventually combine the characteristics (covariates) of each of these models to propose a robust forecasting framework which we call the combined covariates model (CCM). Experimental results show that the CCM performs better than the other models proposed in this paper.