• Title/Summary/Keyword: Neural networks model

Search Result 1,850, Processing Time 0.032 seconds

Research Status of Satellite-based Evapotranspiration and Soil Moisture Estimations in South Korea (위성기반 증발산량 및 토양수분량 산정 국내 연구동향)

  • Choi, Ga-young;Cho, Younghyun
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1141-1180
    • /
    • 2022
  • The application of satellite imageries has increased in the field of hydrology and water resources in recent years. However, challenges have been encountered on obtaining accurate evapotranspiration and soil moisture. Therefore, present researches have emphasized the necessity to obtain estimations of satellite-based evapotranspiration and soil moisture with related development researches. In this study, we presented the research status in Korea by investigating the current trends and methodologies for evapotranspiration and soil moisture. As a result of examining the detailed methodologies, we have ascertained that, in general, evapotranspiration is estimated using Energy balance models, such as Surface Energy Balance Algorithm for Land (SEBAL) and Mapping Evapotranspiration with Internalized Calibration (METRIC). In addition, Penman-Monteith and Priestley-Taylor equations are also used to estimate evapotranspiration. In the case of soil moisture, in general, active (AMSR-E, AMSR2, MIRAS, and SMAP) and passive (ASCAT and SAR)sensors are used for estimation. In terms of statistics, deep learning, as well as linear regression equations and artificial neural networks, are used for estimating these parameters. There were a number of research cases in which various indices were calculated using satellite-based data and applied to the characterization of drought. In some cases, hydrological cycle factors of evapotranspiration and soil moisture were calculated based on the Land Surface Model (LSM). Through this process, by comparing, reviewing, and presenting major detailed methodologies, we intend to use these references in related research, and lay the foundation for the advancement of researches on the calculation of satellite-based hydrological cycle data in the future.

Detection of Proximal Caries Lesions with Deep Learning Algorithm (심층학습 알고리즘을 활용한 인접면 우식 탐지)

  • Hyuntae, Kim;Ji-Soo, Song;Teo Jeon, Shin;Hong-Keun, Hyun;Jung-Wook, Kim;Ki-Taeg, Jang;Young-Jae, Kim
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.49 no.2
    • /
    • pp.131-139
    • /
    • 2022
  • This study aimed to evaluate the effectiveness of deep convolutional neural networks (CNNs) for diagnosis of interproximal caries in pediatric intraoral radiographs. A total of 500 intraoral radiographic images of first and second primary molars were used for the study. A CNN model (Resnet 50) was applied for the detection of proximal caries. The diagnostic accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curve, and area under ROC curve (AUC) were calculated on the test dataset. The diagnostic accuracy was 0.84, sensitivity was 0.74, and specificity was 0.94. The trained CNN algorithm achieved AUC of 0.86. The diagnostic CNN model for pediatric intraoral radiographs showed good performance with high accuracy. Deep learning can assist dentists in diagnosis of proximal caries lesions in pediatric intraoral radiographs.

A Comparative Study on Data Augmentation Using Generative Models for Robust Solar Irradiance Prediction

  • Jinyeong Oh;Jimin Lee;Daesungjin Kim;Bo-Young Kim;Jihoon Moon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.11
    • /
    • pp.29-42
    • /
    • 2023
  • In this paper, we propose a method to enhance the prediction accuracy of solar irradiance for three major South Korean cities: Seoul, Busan, and Incheon. Our method entails the development of five generative models-vanilla GAN, CTGAN, Copula GAN, WGANGP, and TVAE-to generate independent variables that mimic the patterns of existing training data. To mitigate the bias in model training, we derive values for the dependent variables using random forests and deep neural networks, enriching the training datasets. These datasets are integrated with existing data to form comprehensive solar irradiance prediction models. The experimentation revealed that the augmented datasets led to significantly improved model performance compared to those trained solely on the original data. Specifically, CTGAN showed outstanding results due to its sophisticated mechanism for handling the intricacies of multivariate data relationships, ensuring that the generated data are diverse and closely aligned with the real-world variability of solar irradiance. The proposed method is expected to address the issue of data scarcity by augmenting the training data with high-quality synthetic data, thereby contributing to the operation of solar power systems for sustainable development.

Analysis of the Impact of Satellite Remote Sensing Information on the Prediction Performance of Ungauged Basin Stream Flow Using Data-driven Models (인공위성 원격 탐사 정보가 자료 기반 모형의 미계측 유역 하천유출 예측성능에 미치는 영향 분석)

  • Seo, Jiyu;Jung, Haeun;Won, Jeongeun;Choi, Sijung;Kim, Sangdan
    • Journal of Wetlands Research
    • /
    • v.26 no.2
    • /
    • pp.147-159
    • /
    • 2024
  • Lack of streamflow observations makes model calibration difficult and limits model performance improvement. Satellite-based remote sensing products offer a new alternative as they can be actively utilized to obtain hydrological data. Recently, several studies have shown that artificial intelligence-based solutions are more appropriate than traditional conceptual and physical models. In this study, a data-driven approach combining various recurrent neural networks and decision tree-based algorithms is proposed, and the utilization of satellite remote sensing information for AI training is investigated. The satellite imagery used in this study is from MODIS and SMAP. The proposed approach is validated using publicly available data from 25 watersheds. Inspired by the traditional regionalization approach, a strategy is adopted to learn one data-driven model by integrating data from all basins, and the potential of the proposed approach is evaluated by using a leave-one-out cross-validation regionalization setting to predict streamflow from different basins with one model. The GRU + Light GBM model was found to be a suitable model combination for target basins and showed good streamflow prediction performance in ungauged basins (The average model efficiency coefficient for predicting daily streamflow in 25 ungauged basins is 0.7187) except for the period when streamflow is very small. The influence of satellite remote sensing information was found to be up to 10%, with the additional application of satellite information having a greater impact on streamflow prediction during low or dry seasons than during wet or normal seasons.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Multi-task Learning Based Tropical Cyclone Intensity Monitoring and Forecasting through Fusion of Geostationary Satellite Data and Numerical Forecasting Model Output (정지궤도 기상위성 및 수치예보모델 융합을 통한 Multi-task Learning 기반 태풍 강도 실시간 추정 및 예측)

  • Lee, Juhyun;Yoo, Cheolhee;Im, Jungho;Shin, Yeji;Cho, Dongjin
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1037-1051
    • /
    • 2020
  • The accurate monitoring and forecasting of the intensity of tropical cyclones (TCs) are able to effectively reduce the overall costs of disaster management. In this study, we proposed a multi-task learning (MTL) based deep learning model for real-time TC intensity estimation and forecasting with the lead time of 6-12 hours following the event, based on the fusion of geostationary satellite images and numerical forecast model output. A total of 142 TCs which developed in the Northwest Pacific from 2011 to 2016 were used in this study. The Communications system, the Ocean and Meteorological Satellite (COMS) Meteorological Imager (MI) data were used to extract the images of typhoons, and the Climate Forecast System version 2 (CFSv2) provided by the National Center of Environmental Prediction (NCEP) was employed to extract air and ocean forecasting data. This study suggested two schemes with different input variables to the MTL models. Scheme 1 used only satellite-based input data while scheme 2 used both satellite images and numerical forecast modeling. As a result of real-time TC intensity estimation, Both schemes exhibited similar performance. For TC intensity forecasting with the lead time of 6 and 12 hours, scheme 2 improved the performance by 13% and 16%, respectively, in terms of the root mean squared error (RMSE) when compared to scheme 1. Relative root mean squared errors(rRMSE) for most intensity levels were lessthan 30%. The lower mean absolute error (MAE) and RMSE were found for the lower intensity levels of TCs. In the test results of the typhoon HALONG in 2014, scheme 1 tended to overestimate the intensity by about 20 kts at the early development stage. Scheme 2 slightly reduced the error, resulting in an overestimation by about 5 kts. The MTL models reduced the computational cost about 300% when compared to the single-tasking model, which suggested the feasibility of the rapid production of TC intensity forecasts.

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.213-225
    • /
    • 2010
  • Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Long-Term Memory and Correct Answer Rate of Foreign Exchange Data (환율데이타의 장기기억성과 정답율)

  • Weon, Sek-Jun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3866-3873
    • /
    • 2000
  • In this paper, we investigates the long-term memory and the Correct answer rate of the foreign exchange data (Yen/Dollar) that is one of economic time series, There are many cases where two kinds of fractal dimensions exist in time series generated from dynamical systems such as AR models that are typical models having a short terrr memory, The sample interval separating from these two dimensions are denoted by kcrossover. Let the fractal dimension be $D_1$ in K < $k^{crossover}$,and $D_2$ in K > $k^{crossover}$ from the statistics mode. In usual, Statistic models have dimensions D1 and D2 such that $D_1$ < $D_2$ and $D_2\cong2$ But it showed a result contrary to this in the real time series such as NIKKEL The exchange data that is one of real time series have relation of $D_1$ > $D_2$ When the interval between data increases, the correlation between data increases, which is quite a peculiar phenomenon, We predict exchange data by neural networks, We confirm that $\beta$ obrained from prediction errors and D calculated from time series data precisely satisfy the relationship $\beta$ = 2-2D which is provided from a non-linear model having fractal dimension, And We identified that the difference of fractal dimension appeaed in the Correct answer rate.

  • PDF

Anatomical Brain Connectivity Map of Korean Children (한국 아동 집단의 구조 뇌연결지도)

  • Um, Min-Hee;Park, Bum-Hee;Park, Hae-Jeong
    • Investigative Magnetic Resonance Imaging
    • /
    • v.15 no.2
    • /
    • pp.110-122
    • /
    • 2011
  • Purpose : The purpose of this study is to establish the method generating human brain anatomical connectivity from Korean children and evaluating the network topological properties using small-world network analysis. Materials and Methods : Using diffusion tensor images (DTI) and parcellation maps of structural MRIs acquired from twelve healthy Korean children, we generated a brain structural connectivity matrix for individual. We applied one sample t-test to the connectivity maps to derive a representative anatomical connectivity for the group. By spatially normalizing the white matter bundles of participants into a template standard space, we obtained the anatomical brain network model. Network properties including clustering coefficient, characteristic path length, and global/local efficiency were also calculated. Results : We found that the structural connectivity of Korean children group preserves the small-world properties. The anatomical connectivity map obtained in this study showed that children group had higher intra-hemispheric connectivity than inter-hemispheric connectivity. We also observed that the neural connectivity of the group is high between brain stem and motorsensory areas. Conclusion : We suggested a method to examine the anatomical brain network of Korean children group. The proposed method can be used to evaluate the efficiency of anatomical brain networks in people with disease.