Search | Korea Science

Comparative Study of Data Preprocessing and ML&DL Model Combination for Daily Dam Inflow Prediction (댐 일유입량 예측을 위한 데이터 전처리와 머신러닝&딥러닝 모델 조합의 비교연구)

Youngsik Jo;Kwansue Jung
- Proceedings of the Korea Water Resources Association Conference
- /
- 2023.05a
- /
- pp.358-358
- /
- 2023
본 연구에서는 그동안 수자원분야 강우유출 해석분야에 활용되었던 대표적인 머신러닝&딥러닝(ML&DL) 모델을 활용하여 모델의 하이퍼파라미터 튜닝뿐만 아니라 모델의 특성을 고려한 기상 및 수문데이터의 조합과 전처리(lag-time, 이동평균 등)를 통하여 데이터 특성과 ML&DL모델의 조합시나리오에 따른 일 유입량 예측성능을 비교 검토하는 연구를 수행하였다. 이를 위해 소양강댐 유역을 대상으로 1974년에서 2021년까지 축적된 기상 및 수문데이터를 활용하여 1) 강우, 2) 유입량, 3) 기상자료를 주요 영향변수(독립변수)로 고려하고, 이에 a) 지체시간(lag-time), b) 이동평균, c) 유입량의 성분분리조건을 적용하여 총 36가지 시나리오 조합을 ML&DL의 입력자료로 활용하였다. ML&DL 모델은 1) Linear Regression(LR), 2) Lasso, 3) Ridge, 4) SVR(Support Vector Regression), 5) Random Forest(RF), 6) LGBM(Light Gradient Boosting Model), 7) XGBoost의 7가지 ML방법과 8) LSTM(Long Short-Term Memory models), 9) TCN(Temporal Convolutional Network), 10) LSTM-TCN의 3가지 DL 방법, 총 10가지 ML&DL모델을 비교 검토하여 일유입량 예측을 위한 가장 적합한 데이터 조합 특성과 ML&DL모델을 성능평가와 함께 제시하였다. 학습된 모형의 유입량 예측 결과를 비교·분석한 결과, 소양강댐 유역에서는 딥러닝 중에서는 TCN모형이 가장 우수한 성능을 보였고(TCN>TCN-LSTM>LSTM), 트리기반 머신러닝중에서는 Random Forest와 LGBM이 우수한 성능을 보였으며(RF, LGBM>XGB), SVR도 LGBM수준의 우수한 성능을 나타내었다. LR, Lasso, Ridge 세가지 Regression모형은 상대적으로 낮은 성능을 보였다. 또한 소양강댐 댐유입량 예측에 대하여 강우, 유입량, 기상계열을 36가지로 조합한 결과, 입력자료에 lag-time이 적용된 강우계열의 조합 분석에서 세가지 Regression모델을 제외한 모든 모형에서 NSE(Nash-Sutcliffe Efficiency) 0.8이상(최대 0.867)의 성능을 보였으며, lag-time이 적용된 강우와 유입량계열을 조합했을 경우 NSE 0.85이상(최대 0.901)의 더 우수한 성능을 보였다.
PDF

Malware Detection Using Deep Recurrent Neural Networks with no Random Initialization

Amir Namavar Jahromi;Sattar Hashemi
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.177-189
- /
- 2023
Malware detection is an increasingly important operational focus in cyber security, particularly given the fast pace of such threats (e.g., new malware variants introduced every day). There has been great interest in exploring the use of machine learning techniques in automating and enhancing the effectiveness of malware detection and analysis. In this paper, we present a deep recurrent neural network solution as a stacked Long Short-Term Memory (LSTM) with a pre-training as a regularization method to avoid random network initialization. In our proposal, we use global and short dependencies of the inputs. With pre-training, we avoid random initialization and are able to improve the accuracy and robustness of malware threat hunting. The proposed method speeds up the convergence (in comparison to stacked LSTM) by reducing the length of malware OpCode or bytecode sequences. Hence, the complexity of our final method is reduced. This leads to better accuracy, higher Mattews Correlation Coefficients (MCC), and Area Under the Curve (AUC) in comparison to a standard LSTM with similar detection time. Our proposed method can be applied in real-time malware threat hunting, particularly for safety critical systems such as eHealth or Internet of Military of Things where poor convergence of the model could lead to catastrophic consequences. We evaluate the effectiveness of our proposed method on Windows, Ransomware, Internet of Things (IoT), and Android malware datasets using both static and dynamic analysis. For the IoT malware detection, we also present a comparative summary of the performance on an IoT-specific dataset of our proposed method and the standard stacked LSTM method. More specifically, of our proposed method achieves an accuracy of 99.1% in detecting IoT malware samples, with AUC of 0.985, and MCC of 0.95; thus, outperforming standard LSTM based methods in these key metrics.
https://doi.org/10.22937/IJCSNS.2023.23.8.21 인용 PDF

Threshold heterogeneous autoregressive modeling for realized volatility (임계 HAR 모형을 이용한 실현 변동성 분석)

Sein Moon;Minsu Park;Changryong Baek
- The Korean Journal of Applied Statistics
- /
- v.36 no.4
- /
- pp.295-307
- /
- 2023
The heterogeneous autoregressive (HAR) model is a simple linear model that is commonly used to explain long memory in the realized volatility. However, as realized volatility has more complicated features such as conditional heteroscedasticity, leverage effect, and volatility clustering, it is necessary to extend the simple HAR model. Therefore, to better incorporate the stylized facts, we propose a threshold HAR model with GARCH errors, namely the THAR-GARCH model. That is, the THAR-GARCH model is a nonlinear model whose coefficients vary according to a threshold value, and the conditional heteroscedasticity is explained through the GARCH errors. Model parameters are estimated using an iterative weighted least squares estimation method. Our simulation study supports the consistency of the iterative estimation method. In addition, we show that the proposed THAR-GARCH model has better forecasting power by applying to the realized volatility of major 21 stock indices around the world.
https://doi.org/10.5351/KJAS.2023.36.4.295 인용 PDF

Estimation of CMIP5 based streamflow forecast and optimal training period using the Deep-Learning LSTM model (딥러닝 LSTM 모형을 이용한 CMIP5 기반 하천유량 예측 및 최적 학습기간 산정)

Chun, Beomseok;Lee, Taehwa;Kim, Sangwoo;Lim, Kyoung Jae;Jung, Younghun;Do, Jongwon;Shin, Yongchul
- Proceedings of the Korea Water Resources Association Conference
- /
- 2022.05a
- /
- pp.353-353
- /
- 2022
본 연구에서는 CMIP5(The fifth phase of the Couple Model Intercomparison Project) 미래기후시나리오와 LSTM(Long Short-Term Memory) 모형 기반의 딥러닝 기법을 이용하여 하천유량 예측을 위한 최적 학습 기간을 제시하였다. 연구지역으로는 진안군(성산리) 지점을 선정하였다. 보정(2000~2002/2014~2015) 및 검증(2003~2005/2016~2017) 기간을 설정하여 연구지역의 실측 유량 자료와 LSTM 기반 모의유량을 비교한 결과, 전체적으로 모의값이 실측값을 잘 반영하는 것으로 나타났다. 또한, LSTM 모형의 장기간 예측 성능을 평가하기 위하여 LSTM 모형 기반 유량을 보정(2000~2015) 및 검증(2016~2019) 기간의 SWAT 기반 유량에 비교하였다. 비록 모의결과에일부 오차가 발생하였으나, LSTM 모형이 장기간의 하천유량을 잘 산정하는 것으로 나타났다. 검증 결과를 기반으로 2011년~2100년의 CMIP5 미래기후시나리오 기상자료를 이용하여 SWAT 기반 유량을 모의하였으며, 모의한 하천유량을 LSTM 모형의 학습자료로 사용하였다. 다양한 학습 시나리오을 적용하여 LSTM 및 SWAT 모형 기반의 하천유량을 모의하였으며, 최적 학습 기간을 제시하기 위하여 학습 시나리오별 LSTM/SWAT 기반 하천유량의 상관성 및 불확실성을 비교하였다. 비교 결과 학습 기간이 최소 30년 이상일때, 실측유량과 비교하여 LSTM 모형 기반 하천유량의 불확실성이 낮은 것으로 나타났다. 따라서 CMIP5 미래기후시나리오와 딥러닝 기반 LSTM 모형을 연계하여 미래 장기간의 일별 유량을 모의할 경우, 신뢰성 있는 LSTM 모형 기반 하천유량을 모의하기 위해서는 최소 30년 이상의 학습 기간이 필요할 것으로 판단된다.
PDF

Germinal Center Formation Controlled by Balancing Between Follicular Helper T Cells and Follicular Regulatory T Cells (여포 보조 T세포와 여포 조절 T세포의 균형 및 종자중심 형성)

Park, Hong-Jai;Kim, Do-Hyun;Choi, Je-Min
- Hanyang Medical Reviews
- /
- v.33 no.1
- /
- pp.10-16
- /
- 2013
Follicular helper T cells (Tfh) play a significant role in providing T cell help to B cells during the germinal center reaction, where somatic hypermutation, affinity maturation, isotype class switching, and the differentiation of memory B cells and long-lived plasma cells occur. Antigen-specific T cells with IL-6 and IL-21 upregulate CXCR5, which is required for the migration of T cells into B cell follicles, where these T cells mature into Tfh. The surface markers including PD-1, ICOS, and CD40L play a significant role in providing T cell help to B cells. The upregulation of transcription factor Bcl-6 induces the expression of CXCR5, which is an important factor for Tfh differentiation, by inhibiting the expression of other lineage-specific transcription factors such as T-bet, GATA3, and RORγt. Surprisingly, recent evidence suggests that CD4 T cells already committed to Th1, Th2, and Th17 cells obtain flexibility in their differentiation programs by downregulating T-bet, GATA3, and RORγt, upregulating Bcl-6 and thus convert into Tfh. Limiting the numbers of Tfh within germinal centers is important in the regulation of the autoantibody production that is central to autoimmune diseases. Recently, it was revealed that the germinal center reaction and the size of the Tfh population are also regulated by thymus-derived follicular regulatory T cells (Tfr) expressing CXCR5 and Foxp3. Dysregulation of Tfh appears to be a pathogenic cause of autoimmune disease suggesting that tight regulation of Tfh and germinal center reaction by Tfr is essential for maintaining immune tolerance. Therefore, the balance between Tfh and Tfr appears to be a critical peripheral tolerance mechanism that can inhibit autoimmune disorders.
https://doi.org/10.7599/hmr.2013.33.1.10 인용

Comparative study of meteorological data for river level prediction model (하천 수위 예측 모델을 위한 기상 데이터 비교 연구)

Cho, Minwoo;Yoon, Jinwook;Kim, Changsu;Jung, Heokyung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.05a
- /
- pp.491-493
- /
- 2022
Flood damage due to torrential rains and typhoons is occurring in many parts of the world. In this paper, we propose a water level prediction model using water level, precipitation, and humidity data, which are key parameters for flood prediction, as input data. Based on the LSTM and GRU models, which have already proven time-series data prediction performance in many research fields, different input datasets were constructed using the ASOS(Automated Synoptic Observing System) data and AWS(Automatic Weather System) data provided by the Korea Meteorological Administration, and performance comparison experiments were conducted. As a result, the best results were obtained when using ASOS data. Through this paper, a performance comparison experiment was conducted according to the input data, and as a future study, it is thought that it can be used as an initial study to develop a system that can make an evacuation decision in advance in connection with the flood risk determination model.
PDF

Controlling the false discovery rate in sparse VHAR models using knockoffs (KNOCKOFF를 이용한 성근 VHAR 모형의 FDR 제어)

Minsu, Park;Jaewon, Lee;Changryong, Baek
- The Korean Journal of Applied Statistics
- /
- v.35 no.6
- /
- pp.685-701
- /
- 2022
FDR is widely used in high-dimensional data inference since it provides more liberal criterion contrary to FWER which is known to be very conservative by controlling Type-1 errors. This paper proposes a sparse VHAR model estimation method controlling FDR by adapting the knockoff introduced by Barber and Candès (2015). We also compare knockoff with conventional method using adaptive Lasso (AL) through extensive simulation study. We observe that AL shows sparsistency and decent forecasting performance, however, AL is not satisfactory in controlling FDR. To be more specific, AL tends to estimate zero coefficients as non-zero coefficients. On the other hand, knockoff controls FDR sufficiently well under desired level, but it finds too sparse model when the sample size is small. However, the knockoff is dramatically improved as sample size increases and the model is getting sparser.
https://doi.org/10.5351/KJAS.2022.35.6.685 인용 PDF KSCI

Futures Price Prediction based on News Articles using LDA and LSTM (LDA와 LSTM를 응용한 뉴스 기사 기반 선물가격 예측)

Jin-Hyeon Joo;Keun-Deok Park
- Journal of Industrial Convergence
- /
- v.21 no.1
- /
- pp.167-173
- /
- 2023
As research has been published to predict future data using regression analysis or artificial intelligence as a method of analyzing economic indicators. In this study, we designed a system that predicts prospective futures prices using artificial intelligence that utilizes topic probability data obtained from past news articles using topic modeling. Topic probability distribution data for each news article were obtained using the Latent Dirichlet Allocation (LDA) method that can extract the topic of a document from past news articles via unsupervised learning. Further, the topic probability distribution data were used as the input for a Long Short-Term Memory (LSTM) network, a derivative of Recurrent Neural Networks (RNN) in artificial intelligence, in order to predict prospective futures prices. The method proposed in this study was able to predict the trend of futures prices. Later, this method will also be able to predict the trend of prices for derivative products like options. However, because statistical errors occurred for certain data; further research is required to improve accuracy.
https://doi.org/10.22678/JIC.2023.21.1.167 인용 PDF

Smart contract-based Business Model for growth of Korea Fabless System Semiconductor (한국 팹리스 시스템 반도체 발전을 위한 스마트계약 기반 거래 모델)

Hyoung-woo Kim;Seng-phil Hong;Majer, Marko
- Journal of Advanced Navigation Technology
- /
- v.27 no.2
- /
- pp.235-246
- /
- 2023
In the rapid technological development of artificial intelligence (AI), electric vehicles, and robots based the fourth industrial revolution, semiconductors determine the core performance, and semiconductor competitiveness is directly related to national competitiveness. However, the Korean semiconductor industry has continuously weakened its competitiveness in the system semiconductor field, excluding memory semiconductors, so in this study, a new smart contract basedblockchain business model to engage the global market, which is the most urgent need for the growth of Korean fabless system semiconductor industry in recession. F-SBM (Fabless-Smart contract based Blockchain Model) proposed. In this study, through the new F-SBM, it was verified how to engage new customers for fabless firms through smart contract based consortium blockchain regarding technology, economy, and reliability items of fabless. This model has great significance in improving the high entry barriers to engaging new customers for the long-cherished desire of the Korean fabless system semiconductor industry and deriving new growth solutions.
https://doi.org/10.12673/jant.2023.27.2.235 인용 PDF HTML

Short-Term Water Quality Prediction of the Paldang Reservoir Using Recurrent Neural Network Models (순환신경망 모델을 활용한 팔당호의 단기 수질 예측)

Jiwoo Han;Yong-Chul Cho;Soyoung Lee;Sanghun Kim;Taegu Kang
- Journal of Korean Society on Water Environment
- /
- v.39 no.1
- /
- pp.46-60
- /
- 2023
Climate change causes fluctuations in water quality in the aquatic environment, which can cause changes in water circulation patterns and severe adverse effects on aquatic ecosystems in the future. Therefore, research is needed to predict and respond to water quality changes caused by climate change in advance. In this study, we tried to predict the dissolved oxygen (DO), chlorophyll-a, and turbidity of the Paldang reservoir for about two weeks using long short-term memory (LSTM) and gated recurrent units (GRU), which are deep learning algorithms based on recurrent neural networks. The model was built based on real-time water quality data and meteorological data. The observation period was set from July to September in the summer of 2021 (Period 1) and from March to May in the spring of 2022 (Period 2). We tried to select an algorithm with optimal predictive power for each water quality parameter. In addition, to improve the predictive power of the model, an important variable extraction technique using random forest was used to select only the important variables as input variables. In both Periods 1 and 2, the predictive power after extracting important variables was further improved. Except for DO in Period 2, GRU was selected as the best model in all water quality parameters. This methodology can be useful for preventive water quality management by identifying the variability of water quality in advance and predicting water quality in a short period.
https://doi.org/10.15681/KSWE.2023.39.1.46 인용 PDF

Search Result 1,152, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)