통합 검색 | Korea Science

Predicting movie audience with stacked generalization by combining machine learning algorithms

Park, Junghoon;Lim, Changwon
- Communications for Statistical Applications and Methods
- /
- 제28권3호
- /
- pp.217-232
- /
- 2021
The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.
https://doi.org/10.29220/CSAM.2021.28.3.217 인용 PDF KSCI

Ensemble UNet 3+ for Medical Image Segmentation

JongJin, Park
- International Journal of Internet, Broadcasting and Communication
- /
- 제15권1호
- /
- pp.269-274
- /
- 2023
In this paper, we proposed a new UNet 3+ model for medical image segmentation. The proposed ensemble(E) UNet 3+ model consists of UNet 3+s of varying depths into one unified architecture. UNet 3+s of varying depths have same encoder, but have their own decoders. They can bridge semantic gap between encoder and decoder nodes of UNet 3+. Deep supervision was used for learning on a total of 8 nodes of the E-UNet 3+ to improve performance. The proposed E-UNet 3+ model shows better segmentation results than those of the UNet 3+. As a result of the simulation, the E-UNet 3+ model using deep supervision was the best with loss function values of 0.8904 and 0.8562 for training and validation data. For the test data, the UNet 3+ model using deep supervision was the best with a value of 0.7406. Qualitative comparison of the simulation results shows the results of the proposed model are better than those of existing UNet 3+.
https://doi.org/10.7236/IJIBC.2023.15.1.269 인용 PDF

Ensemble variable selection using genetic algorithm

Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
- Communications for Statistical Applications and Methods
- /
- 제29권6호
- /
- pp.629-640
- /
- 2022
Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.
https://doi.org/10.29220/CSAM.2022.29.6.629 인용 PDF KSCI

트래픽 데이터의 통계적 기반 특징과 앙상블 학습을 이용한 토르 네트워크 웹사이트 핑거프린팅 (Tor Network Website Fingerprinting Using Statistical-Based Feature and Ensemble Learning of Traffic Data)

김준호;김원겸;황두성
- 정보처리학회논문지:소프트웨어 및 데이터공학
- /
- 제9권6호
- /
- pp.187-194
- /
- 2020
본 논문은 클라이언트의 익명성과 개인 정보를 보장하는 토르 네트워크에서 앙상블 학습을 이용한 웹사이트 핑거프린팅 방법을 제안한다. 토르네트워크에서 수집된 트래픽 패킷들로부터 웹사이트 핑거프린팅을 위한 훈련 문제를 구성하며, 트리 기반 앙상블 모델을 적용한 웹사이트 핑거프린팅 시스템의 성능을 비교한다. 훈련 특징 벡터는 트래픽 시퀀스에서 추출된 범용 정보, 버스트, 셀 시퀀스 길이, 그리고 셀 순서로부터 준비하며, 각 웹사이트의 특징은 고정 길이로 표현된다. 실험 평가를 위해 웹사이트 핑거프린팅의 사용에 따른 4가지 학습 문제(Wang14, BW, CW_T, CW_H)를 정의하고, CUMUL 특징 벡터를 사용한 지지 벡터 기계 모델과 성능을 비교한다. 실험 평가에서, BW 경우를 제외하고 제안하는 통계 기반 훈련 특징 표현이 CUMUL 특징 표현보다 우수하다.
https://doi.org/10.3745/KTSDE.2020.9.6.187 인용 PDF KSCI

영작문 자동채점 시스템 개발에서 학습데이터 부족 문제 해결을 위한 앙상블 기법 적용의 효과 (Effect of Application of Ensemble Method on Machine Learning with Insufficient Training Set in Developing Automated English Essay Scoring System)

이경호;이공주
- 정보과학회 논문지
- /
- 제42권9호
- /
- pp.1124-1132
- /
- 2015
일반적으로, 교사 학습 알고리즘이 적절히 학습되기 위해서는 레이블의 편향이 없는 충분한 양의 학습데이터가 필요하다. 그러나 영작문 자동채점 시스템 개발을 위한 충분하고 편향되지 않은 학습데이터를 수집하는 것은 어려운 일이다. 또한 영어 작문 평가의 경우, 전체적인 답안 수준에 대한 다면적인 평가가 이루어진다. 적고 편향되기 쉬운 학습데이터와 이를 이용한 여러 평가영역에 대한 학습모델을 생성해야하기 때문에, 이를 위한 적절한 기계학습 알고리즘을 결정하기 어렵다. 본 논문에서는 이러한 문제를 앙상블학습을 통해 완화할 수 있음을 실험에 통해 보이고자 한다. 실제 중, 고등학교 학생들을 대상으로 시행된 단문형 영작문 채점 결과를 학습데이터 개수와 편향성을 조절하여 실험하였다. 학습데이터의 개수 변화와 편향성 변화의 실험 결과, 에이다부스트 알고리즘을 적용한 결과를 투표로 결합한 앙상블 기법이 다른 알고리즘들 보다 전반적으로 더 나은 성능을 나타냄을 실험을 통해 나타내었다.
https://doi.org/10.5626/JOK.2015.42.9.1124 인용 KSCI

동특성 앙상블 학습 기반 구조물 진단 모니터링 분산처리 시스템 (Decentralized Structural Diagnosis and Monitoring System for Ensemble Learning on Dynamic Characteristics)

신윤수;민경원
- 한국전산구조공학회논문집
- /
- 제34권4호
- /
- pp.183-189
- /
- 2021
구조물에 장기적으로 발생하는 노후화를 정량적으로 파악하기 위해 상시진동 데이터를 활용한 일반화된 모니터링 시스템에 관한 연구가 세계적으로 활발히 수행중이다. 본 연구에서는 구조물에서 장기적으로 취득되는 동특성을 앙상블 학습에 활용하여 구조물의 이상을 감지하기 위한 보급형 엣지 컴퓨팅 시스템을 구축하였다. 시스템의 하드웨어는 라즈베리파이와 보급형 가속도계, 기울기센서, GPS RTK 모듈, 로라 모듈로 구성됐다. 실험실 규모의 구조물 모형 진동실험을 통해 동특성을 활용한 앙상블 학습의 구조물 이상감지를 검증하였으며, 실험을 기반으로 한 실시간 동특성 추출 분산처리 알고리즘을 라즈베리파이에 탑재하였다. 구축된 시스템을 하우징하고 포항시 행정복지센터에 설치하여 데이터를 취득함으로써 개발된 시스템의 현장 적용성을 검증하였다.
https://doi.org/10.7734/COSEIK.2021.34.4.183 인용 PDF KSCI

ECG를 통한 Feature Ensemble 기반 Wolff Parkinson White 증후군 분류 (Feature Ensemble-based Wolff Parkinson White Syndrome classification through ECG)

오규태;김인기;김범준;전영훈;곽정환
- 한국컴퓨터정보학회:학술대회논문집
- /
- 한국컴퓨터정보학회 2023년도 제67차 동계학술대회논문집 31권1호
- /
- pp.169-171
- /
- 2023
Wolff Parkinson White Syndrome(WPW)은 일반인과는 다르게 선천적으로 심방과 심실 사이에 부전도로(Accessory Pathway)가 존재하여 정상 전도와 비교하였을 때, 빠른 속도로 심실을 자극하여 부정맥을 일으키는 것을 의미한다. WPW는 부정맥이 주된 증상이기는 하나, 평소에는 무증상인 경우가 많고, 성인이 되어 갑작스럽게 발생하는 경우가 존재하기 때문에 인지하지 못하고 살아가는 환자들이 많다는 것이 특징이다. 이러한 특징은 갑작스러운 건강 악화가 타인의 생명에 악영향을 줄 수 있는 트럭 운전기사나 의사와 같은 직업군 등의 경우 WPW를 조기에 발견하고 치료해 위험을 사전에 방지하는 것이 매우 중요하다. 따라서, 본 논문에서는 Electrocardiogram(ECG) 데이터를 기반으로 WPW를 자동으로 분류하기 위한 Feature Ensemble 기반 심층 학습 프레임워크를 제안한다. 제안된 기법의 경우 단일 1D-CNN과 GRU를 이용한 기법 대비 F1-Score, Accuracy 기준의 성능 향상을 달성하였기에 본 Task에 적합함을 보여준다.
PDF

Deep Learning-Based Brain Tumor Classification in MRI images using Ensemble of Deep Features

Kang, Jaeyong;Gwak, Jeonghwan
- 한국컴퓨터정보학회논문지
- /
- 제26권7호
- /
- pp.37-44
- /
- 2021
뇌 MRI 영상의 자동 분류는 뇌종양의 조기 진단을 하는 데 있어 중요한 역할을 한다. 본 연구에서 우리는 심층 특징 앙상블을 사용한 MRI 영상에서의 딥 러닝 기반 뇌종양 분류 모델을 제안한다. 우선 사전 학습된 3개의 합성 곱 신경망을 사용하여 입력 MRI 영상에 대한 심층 특징들을 추출한다. 그 이후 추출된 심층 특징들은 완전 연결 계층들로 구성된 분류 모듈의 입력 값으로 들어간다. 분류 모듈에서는 우선 3개의 서로 다른 심층 특징들 각각에 대해 먼저 완전 연결 계층을 거쳐 특징 차원을 줄인다. 그 이후 3개의 차원이 준 특징들을 결합하여 하나의 특징 벡터를 생성한 뒤 다시 완전 연결 계층의 입력값으로 들어가서 최종적인 분류 결과를 예측한다. 우리가 제안한 모델을 평가하기 위해 웹상에 공개된 뇌 MRI 데이터 셋을 사용하였다. 실험 결과 우리가 제안한 모델이 다른 기계학습 기반 모델보다 더 좋은 성능을 나타냄을 확인하였다.
https://doi.org/10.9708/jksci.2021.26.07.037 인용 PDF KSCI HTML

네트워크 트래픽 수집 및 복원을 통한 내부자 행위 분석 프레임워크 연구 (A Study on the Insider Behavior Analysis Framework for Detecting Information Leakage Using Network Traffic Collection and Restoration)

고장혁;이동호
- 디지털산업정보학회논문지
- /
- 제13권4호
- /
- pp.125-139
- /
- 2017
In this paper, we developed a framework to detect and predict insider information leakage by collecting and restoring network traffic. For automated behavior analysis, many meta information and behavior information obtained using network traffic collection are used as machine learning features. By these features, we created and learned behavior model, network model and protocol-specific models. In addition, the ensemble model was developed by digitizing and summing the results of various models. We developed a function to present information leakage candidates and view meta information and behavior information from various perspectives using the visual analysis. This supports to rule-based threat detection and machine learning based threat detection. In the future, we plan to make an ensemble model that applies a regression model to the results of the models, and plan to develop a model with deep learning technology.
https://doi.org/10.17662/ksdim.2017.13.4.125 인용 PDF KSCI

앙상블의 편기와 분산을 이용한 패턴 선택 (Pattern Selection Using the Bias and Variance of Ensemble)

신현정;조성중
- 대한산업공학회지
- /
- 제28권1호
- /
- pp.112-127
- /
- 2002
A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern 'utility index' that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.
PDF KSCI

검색결과 385건 처리시간 0.027초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)