통합 검색 | Korea Science

Predicting movie audience with stacked generalization by combining machine learning algorithms

Park, Junghoon;Lim, Changwon
- Communications for Statistical Applications and Methods
- /
- 제28권3호
- /
- pp.217-232
- /
- 2021
The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.
https://doi.org/10.29220/CSAM.2021.28.3.217 인용 PDF KSCI

Ensemble variable selection using genetic algorithm

Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
- Communications for Statistical Applications and Methods
- /
- 제29권6호
- /
- pp.629-640
- /
- 2022
Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.
https://doi.org/10.29220/CSAM.2022.29.6.629 인용 PDF KSCI

Genetic classification of various familial relationships using the stacking ensemble machine learning approaches

Su Jin Jeong;Hyo-Jung Lee;Soong Deok Lee;Ji Eun Park;Jae Won Lee
- Communications for Statistical Applications and Methods
- /
- 제31권3호
- /
- pp.279-289
- /
- 2024
Familial searching is a useful technique in a forensic investigation. Using genetic information, it is possible to identify individuals, determine familial relationships, and obtain racial/ethnic information. The total number of shared alleles (TNSA) and likelihood ratio (LR) methods have traditionally been used, and novel data-mining classification methods have recently been applied here as well. However, it is difficult to apply these methods to identify familial relationships above the third degree (e.g., uncle-nephew and first cousins). Therefore, we propose to apply a stacking ensemble machine learning algorithm to improve the accuracy of familial relationship identification. Using real data analysis, we obtain superior relationship identification results when applying meta-classifiers with a stacking algorithm rather than applying traditional TNSA or LR methods and data mining techniques.
https://doi.org/10.29220/CSAM.2024.31.3.279 인용 PDF

컨셉 변동 스트리밍 데이터를 위한 적응적 가중치 조정을 이용한 동적 앙상블 방법 (A Dynamic Ensemble Method using Adaptive Weight Adjustment for Concept Drifting Streaming Data)

김영덕;박정희
- 정보과학회 논문지
- /
- 제44권8호
- /
- pp.842-853
- /
- 2017
스트리밍 데이터는 시간에 따라 지속적으로 생성되는 데이터 시퀀스이다. 시간이 지남에 따라 데이터의 분포 또는 컨셉이 변화할 수 있으며, 이러한 변화는 분류 모델의 성능을 저하시키는 요인이 된다. 점층적 적응적 학습 방법은 컨셉 변화의 정도에 따라 현재 분류 모델의 가중치를 조절하여 업데이트를 수행함으로써 컨셉 변화에 대한 분류 모델의 성능을 유지할 수 있게 한다. 그러나, 컨셉 변화의 정도에 맞는 적절한 가중치를 결정하기가 어렵다는 문제점이 있다. 본 논문에서는 컨셉 변화에 따른 적응적 가중치 조정에 기반한 동적 앙상블 방법을 제안한다. 실험 결과는 제안한 방법이 다른 비교 방법들에 비해 높은 성능을 보여줌을 입증한다.
https://doi.org/10.5626/JOK.2017.44.8.842 인용 KSCI

Context-aware Video Surveillance System

An, Tae-Ki;Kim, Moon-Hyun
- Journal of Electrical Engineering and Technology
- /
- 제7권1호
- /
- pp.115-123
- /
- 2012
A video analysis system used to detect events in video streams generally has several processes, including object detection, object trajectories analysis, and recognition of the trajectories by comparison with an a priori trained model. However, these processes do not work well in a complex environment that has many occlusions, mirror effects, and/or shadow effects. We propose a new approach to a context-aware video surveillance system to detect predefined contexts in video streams. The proposed system consists of two modules: a feature extractor and a context recognizer. The feature extractor calculates the moving energy that represents the amount of moving objects in a video stream and the stationary energy that represents the amount of still objects in a video stream. We represent situations and events as motion changes and stationary energy in video streams. The context recognizer determines whether predefined contexts are included in video streams using the extracted moving and stationary energies from a feature extractor. To train each context model and recognize predefined contexts in video streams, we propose and use a new ensemble classifier based on the AdaBoost algorithm, DAdaBoost, which is one of the most famous ensemble classifier algorithms. Our proposed approach is expected to be a robust method in more complex environments that have a mirror effect and/or a shadow effect.
https://doi.org/10.5370/JEET.2012.7.1.115 인용 PDF KSCI

대량 데이터를 위한 제한거절 기반의 회귀부스팅 기법 (Boosted Regression Method based on Rejection Limits for Large-Scale Data)

권혁호;김승욱;최동훈;이기천
- 대한산업공학회지
- /
- 제42권4호
- /
- pp.263-269
- /
- 2016
The purpose of this study is to challenge a computational regression-type problem, that is handling large-size data, in which conventional metamodeling techniques often fail in a practical sense. To solve such problems, regression-type boosting, one of ensemble model techniques, together with bootstrapping-based re-sampling is a reasonable choice. This study suggests weight updates by the amount of the residual itself and a new error decision criterion which constructs an ensemble model of models selectively chosen by rejection limits. Through these ideas, we propose AdaBoost.RMU.R as a metamodeling technique suitable for handling large-size data. To assess the performance of the proposed method in comparison to some existing methods, we used 6 mathematical problems. For each problem, we computed the average and the standard deviation of residuals between real response values and predicted response values. Results revealed that the average and the standard deviation of AdaBoost.RMU.R were improved than those of other algorithms.
https://doi.org/10.7232/JKIIE.2016.42.4.263 인용 PDF KSCI

칼만필터의 자료동화 활용을 위한 배경오차 공분산의 명시적 시간 진전 제거 (An Affordable Implementation of Kalman Filter by Eliminating the Explicit Temporal Evolution of the Background Error Covariance Matrix)

임규호;서애숙;하지현
- 대기
- /
- 제23권1호
- /
- pp.33-37
- /
- 2013
In meteorology, exploitation of Kalman filter as a data assimilation system is virtually impossible due to simultaneous requirements of adjoint model and large computer resource. The other substitute of utilizing ensemble Kalman filter is only affordable by compensating an enormous usage of computing resource. Furthermore, the latter employs ensemble integration sets for evolving the background error covariance matrix by compensating the dynamical feature of the temporal evolution of weather conditions. We propose a new implementation method that works without the adjoint model by utilizing the explicit expression of the background error covariance matrix in backward evolution. It will also break a barrier in the evolution of the covariance matrix. The method may be applied with a slight modification to the real time assimilation or the retrospective analysis.
https://doi.org/10.14191/Atmos.2013.23.1.033 인용 PDF KSCI

An investigation of the structure of ensemble averaged extreme wind events

Scarabino, A.;Sterling, M.;Richards, P.J.;Baker, C.J.;Hoxey, R.P.
- Wind and Structures
- /
- 제10권2호
- /
- pp.135-151
- /
- 2007
This paper examines the extreme gust profiles obtained by conditionally sampling full-scale velocity data obtained in the lower part of the atmospheric boundary layer. It is demonstrated that three different types of behaviour can be observed in the streamwise component of velocity. In all cases the corresponding vertical velocity component illustrates similar behaviour. An idealised horseshoe vortex model and a downburst model are investigated to examine if such structures can explain the behaviour observed. In addition, an empirical model is developed for an isolated gust corresponding to each of the three types of behaviour observed. It is possible that the division of the gust profile into three different types may lead to an improvement in the correlation of extreme gust events with respect to type.
https://doi.org/10.12989/was.2007.10.2.135 인용

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
- Journal of information and communication convergence engineering
- /
- 제20권1호
- /
- pp.31-40
- /
- 2022
Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.
https://doi.org/10.6109/jicce.2022.20.1.31 인용 PDF KSCI

Taxi-demand forecasting using dynamic spatiotemporal analysis

Gangrade, Akshata;Pratyush, Pawel;Hajela, Gaurav
- ETRI Journal
- /
- 제44권4호
- /
- pp.624-640
- /
- 2022
Taxi-demand forecasting and hotspot prediction can be critical in reducing response times and designing a cost effective online taxi-booking model. Taxi demand in a region can be predicted by considering the past demand accumulated in that region over a span of time. However, other covariates-like neighborhood influence, sociodemographic parameters, and point-of-interest data-may also influence the spatiotemporal variation of demand. To study the effects of these covariates, in this paper, we propose three models that consider different covariates in order to select a set of independent variables. These models predict taxi demand in spatial units for a given temporal resolution using linear and ensemble regression. We eventually combine the characteristics (covariates) of each of these models to propose a robust forecasting framework which we call the combined covariates model (CCM). Experimental results show that the CCM performs better than the other models proposed in this paper.
https://doi.org/10.4218/etrij.2021-0123 인용 PDF KSCI

검색결과 638건 처리시간 0.021초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)