• Title/Summary/Keyword: ensemble method

Search Result 508, Processing Time 0.024 seconds

Field observation of sediment suspension in the surf zone (쇄파대의 저질부유에 관한 현지관측)

  • Shin, Seung-Ho;Kuriyama, Yoshiaki
    • Journal of Navigation and Port Research
    • /
    • v.27 no.4
    • /
    • pp.455-463
    • /
    • 2003
  • Time series of suspended sediment concentration, surface elevation and velocity were measured and analysed to investigate the role of waves and the predominance of infra-gravity wave component for sediment suspension phenomena in the surf zone. For the investigation in detail, we adopted the cross spectral analysis method between suspended sediment concentration and the characteristic values of wave, and ensemble average analysis method about long-period wave component, which is dominant to sediment suspension in the measurement point. The obtained results are summarized as follows: 1)The relationship between suspended sediment concentration and the characteristic values of wave is stronger for the long-period standing wave components(about 60s and 30s where the nodal point of the first mode and the anti-nodal point of the second mode are located at the measurement point, respectively) than the long wave components(about 100s), which have the most energetic power, 2) and also, it is cleared that suspended sediment concentration is increased in the case of the phase, the velocity components of the first mode long-period standing wave(60sec) were accelerated toward on-shore direction, that is, the water surface in offshore side is higher than on-shore side.

A study on a tendency of parameters for nonstationary distribution using ensemble empirical mode decomposition method (앙상블 경험적 모드분해법을 활용한 비정상성 확률분포형의 매개변수 추세 분석에 관한 연구)

  • Kim, Hanbeen;Kim, Taereem;Shin, Hongjoon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.4
    • /
    • pp.253-261
    • /
    • 2017
  • A lot of nonstationary frequency analyses have been studied in recent years as the nonstationarity occurs in hydrologic time series data. In nonstationary frequency analysis, various forms of probability distributions have been proposed to consider the time-dependent statistical characteristics of nonstationary data, and various methods for parameter estimation also have been studied. In this study, we aim to introduce a parameter estimation method for nonstationary Gumbel distribution using ensemble empirical mode decomposition (EEMD); and to compare the results with the method of maximum likelihood. Annual maximum rainfall data with a trend observed by Korea Meteorological Administration (KMA) was applied. As a result, both EEMD and the method of maximum likelihood selected an appropriate nonstationary Gumbel distribution for linear trend data, while the EEMD selected more appropriate nonstationary Gumbel distribution than the method of maximum likelihood for quadratic trend data.

Simulation of Multi-Variate Random Processes (다변수 확률과정의 시뮬레이션)

  • ;M. Shinozuka
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1990.04a
    • /
    • pp.24-30
    • /
    • 1990
  • An improved algorithm for simulation of multi-variate random processes has been presented. It is based on the spectral representation method. The conventional methods give sample time histories which satisfy the target spectral density matrix only in the sense of ensemble average. However, the present method can generate sample functions which satisfy the target spectra in the ergodic sense. Example analysis is given for the simulation of earthquake accelerations with three components.

  • PDF

Prediction of Andong Reservoir Inflow Using Ensemble Technique (앙상블 기법을 이용한 안동댐 유입량 예측)

  • Kang, Min Suk;Yu, Myungsu;Yi, Jaeeung
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.3
    • /
    • pp.795-804
    • /
    • 2014
  • In this study, Andong Reservoir monthly and ten days inflows from July 2011 to September 2011 are predicted using SWAT model and ensemble technique. The weight method using monthly and ten days rainfall forecasts from Korea Meteorological Administration is applied for accurate analysis. If the rainfall prediction announced by Korea Meteorological Administration is close to the actual rainfall, the PDF-Ratio Method shows the best result. If the past high rainfall occurrence is close to the actual rainfall, the modified PDF-Ratio method shows the best result. This method can improve the prediction accuracy even though the Korea Meteorological Administration forecast is not accurate. On the contrary, if Korea Meteorological Administration forecast is different from the actual rainfall and the past rainfall occurrence statistics of lower section, the uniform method shows the best result.

Named Entity Recognition Using Distant Supervision and Active Bagging (원거리 감독과 능동 배깅을 이용한 개체명 인식)

  • Lee, Seong-hee;Song, Yeong-kil;Kim, Hark-soo
    • Journal of KIISE
    • /
    • v.43 no.2
    • /
    • pp.269-274
    • /
    • 2016
  • Named entity recognition is a process which extracts named entities in sentences and determines categories of the named entities. Previous studies on named entity recognition have primarily been used for supervised learning. For supervised learning, a large training corpus manually annotated with named entity categories is needed, and it is a time-consuming and labor-intensive job to manually construct a large training corpus. We propose a semi-supervised learning method to minimize the cost needed for training corpus construction and to rapidly enhance the performance of named entity recognition. The proposed method uses distance supervision for the construction of the initial training corpus. It can then effectively remove noise sentences in the initial training corpus through the use of an active bagging method, an ensemble method of bagging and active learning. In the experiments, the proposed method improved the F1-score of named entity recognition from 67.36% to 76.42% after active bagging for 15 times.

Implementation of Spatial Downscaling Method Based on Gradient and Inverse Distance Squared (GIDS) for High-Resolution Numerical Weather Prediction Data (고해상도 수치예측자료 생산을 위한 경도-역거리 제곱법(GIDS) 기반의 공간 규모 상세화 기법 활용)

  • Yang, Ah-Ryeon;Oh, Su-Bin;Kim, Joowan;Lee, Seung-Woo;Kim, Chun-Ji;Park, Soohyun
    • Atmosphere
    • /
    • v.31 no.2
    • /
    • pp.185-198
    • /
    • 2021
  • In this study, we examined a spatial downscaling method based on Gradient and Inverse Distance Squared (GIDS) weighting to produce high-resolution grid data from a numerical weather prediction model over Korean Peninsula with complex terrain. The GIDS is a simple and effective geostatistical downscaling method using horizontal distance gradients and an elevation. The predicted meteorological variables (e.g., temperature and 3-hr accumulated rainfall amount) from the Limited-area ENsemble prediction System (LENS; horizontal grid spacing of 3 km) are used for the GIDS to produce a higher horizontal resolution (1.5 km) data set. The obtained results were compared to those from the bilinear interpolation. The GIDS effectively produced high-resolution gridded data for temperature with the continuous spatial distribution and high dependence on topography. The results showed a better agreement with the observation by increasing a searching radius from 10 to 30 km. However, the GIDS showed relatively lower performance for the precipitation variable. Although the GIDS has a significant efficiency in producing a higher resolution gridded temperature data, it requires further study to be applied for rainfall events.

Drought index forecast using ensemble learning (앙상블 기법을 이용한 가뭄지수 예측)

  • Jeong, Jihyeon;Cha, Sanghun;Kim, Myojeong;Kim, Gwangseob;Lim, Yoon-Jin;Lee, Kyeong Eun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1125-1132
    • /
    • 2017
  • In a situation where the severity and frequency of drought events getting stronger and higher, many studies related to drought forecast have been conducted to improve the drought forecast accuracy. However it is difficult to predict drought events using a single model because of nonlinear and complicated characteristics of temporal behavior of drought events. In this study, in order to overcome the shortcomings of the single model approach, we first build various single models capable to explain the relationship between the meteorological drought index, Standardized Precipitation Index (SPI), and other independent variables such as world climate indices. Then, we developed a combined models using Stochastic Gradient Descent method among Ensemble Learnings.

Estimating Korean Pine(Pinus koraiensis) Habitat Distribution Considering Climate Change Uncertainty - Using Species Distribution Models and RCP Scenarios - (불확실성을 고려한 미래 잣나무의 서식 적지 분포 예측 - 종 분포 모형과 RCP시나리오를 중심으로 -)

  • Ahn, Yoonjung;Lee, Dong-Kun;Kim, Ho Gul;Park, Chan;Kim, Jiyeon;Kim, Jae-uk
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.3
    • /
    • pp.51-64
    • /
    • 2015
  • Climate change will make significant impact on species distribution in forest. Pinus koraiensis which is commonly called as Korean Pine is normally distributed in frigid zones. Climate change which causes severe heat could affect distribution of Korean pine. Therefore, this study predicted the distribution of Korean Pine and the suitable habitat area with consideration on uncertainty by applying climate change scenarios on an ensemble model. First of all, a site index was considered when selecting present and absent points and a stratified method was used to select the points. Secondly, environmental and climate variables were chosen by literature review and then confirmed with experts. Those variables were used as input data of BIOMOD2. Thirdly, the present distribution model was made. The result was validated with ROC. Lastly, RCP scenarios were applied on the models to create the future distribution model. As a results, each individual model shows quite big differences in the results but generally most models and ensemble models estimated that the suitable habitat area would be decreased in midterm future(40s) as well as long term future(90s).

Searching for Optimal Ensemble of Feature-classifier Pairs in Gene Expression Profile using Genetic Algorithm (유전알고리즘을 이용한 유전자발현 데이타상의 특징-분류기쌍 최적 앙상블 탐색)

  • 박찬호;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.525-536
    • /
    • 2004
  • Gene expression profile is numerical data of gene expression level from organism, measured on the microarray. Generally, each specific tissue indicates different expression levels in related genes, so that we can classify disease with gene expression profile. Because all genes are not related to disease, it is needed to select related genes that is called feature selection, and it is needed to classify selected genes properly. This paper Proposes GA based method for searching optimal ensemble of feature-classifier pairs that are composed with seven feature selection methods based on correlation, similarity, and information theory, and six representative classifiers. In experimental results with leave-one-out cross validation on two gene expression Profiles related to cancers, we can find ensembles that produce much superior to all individual feature-classifier fairs for Lymphoma dataset and Colon dataset.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.