• Title/Summary/Keyword: analysis of algorithms

Search Result 3,548, Processing Time 0.036 seconds

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

Linearity Estimation of PET/CT Scanner in List Mode Acquisition (List Mode에서 PET/CT Scanner의 직선성 평가)

  • Choi, Hyun-Jun;Kim, Byung-Jin;Ito, Mikiko;Lee, Hong-Jae;Kim, Jin-Ui;Kim, Hyun-Joo;Lee, Jae-Sung;Lee, Dong-Soo
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.16 no.1
    • /
    • pp.86-90
    • /
    • 2012
  • Purpose: Quantification of myocardial blood flow (MBF) using dynamic PET imaging has the potential to assess coronary artery disease. Rb-82 plays a key role in the clinical assessment of myocardial perfusion using PET. However, MBF could be overestimated due to the underestimation of left ventricular input function in the beginning of the acquisition when the scanner has non-linearity between count rate and activity concentration due to the scanner dead-time. Therefore, in this study, we evaluated the count rate linearity as a function of the activity concentration in PET data acquired in list mode. Materials & methods: A cylindrical phantom (diameter, 12 cm length, 10.5 cm) filled with 296 MBq F-18 solution and 800 mL of water was used to estimate the linearity of the Biograph 40 True Point PET/CT scanner. PET data was acquired with 10 min per frame of 1 bed duration in list mode for different activity concentration levels in 7 half-lives. The images were reconstructed by OSEM and FBP algorithms. Prompt, net true and random counts of PET data according to the activity concentration were measured. Total and background counts were measured by drawing ROI on the phantom images and linearity was measured using background correction. Results: The prompt count rates in list mode were linearly increased proportionally to the activity concentration. At a low activity concentration (<30 kBq/mL), the prompt net true and random count rates were increased with the activity concentration. At a high activity concentration (>30 kBq/mL), the increasing rate of the prompt net true rates was slightly decreased while the increasing rate of random counts was increased. There was no difference in the image intensity linearity between OSEM and FBP algorithms. Conclusion: The Biograph 40 True Point PET/CT scanner showed good linearity of count rate even at a high activity concentration (~370 kBq/mL).The result indicates that the scanner is useful for the quantitative analysis of data in heart dynamic studies using Rb-82, N-13, O-15 and F-18.

  • PDF

Quantitative Rainfall Estimation for S-band Dual Polarization Radar using Distributed Specific Differential Phase (분포형 비차등위상차를 이용한 S-밴드 이중편파레이더의 정량적 강우 추정)

  • Lee, Keon-Haeng;Lim, Sanghun;Jang, Bong-Joo;Lee, Dong-Ryul
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.1
    • /
    • pp.57-67
    • /
    • 2015
  • One of main benefits of a dual polarization radar is improvement of quantitative rainfall estimation. In this paper, performance of two representative rainfall estimation methods for a dual polarization radar, JPOLE and CSU algorithms, have been compared by using data from a MOLIT S-band dual polarization radar. In addition, this paper presents evaluation of specific differential phase ($K_{dp}$) retrieval algorithm proposed by Lim et al. (2013). Current $K_{dp}$ retrieval methods are based on range filtering technique or regression analysis. However, these methods can result in underestimating peak $K_{dp}$ or negative values in convective regions, and fluctuated $K_{dp}$ in low rain rate regions. To resolve these problems, this study applied the $K_{dp}$ distribution method suggested by Lim et al. (2013) and evaluated by adopting new $K_{dp}$ to JPOLE and CSU algorithms. Data were obtained from the Mt. Biseul radar of MOLIT for two rainfall events in 2012. Results of evaluation showed improvement of the peak $K_{dp}$ and did not show fluctuation and negative $K_{dp}$ values. Also, in heavy rain (daily rainfall > 80 mm), accumulated daily rainfall using new $K_{dp}$ was closer to AWS observation data than that using legacy $K_{dp}$, but in light rain(daily rainfall < 80mm), improvement was insignificant, because $K_{dp}$ is used mostly in case of heavy rain rate of quantitative rainfall estimation algorithm.

Accuracy Analysis of Target Recognition according to EOC Conditions (Target Occlusion and Depression Angle) using MSTAR Data (MSTAR 자료를 이용한 EOC 조건(표적 폐색 및 촬영부각)에 따른 표적인식 정확도 분석)

  • Kim, Sang-Wan;Han, Ahrim;Cho, Keunhoo;Kim, Donghan;Park, Sang-Eun
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.3
    • /
    • pp.457-470
    • /
    • 2019
  • Automatic Target Recognition (ATR) using Synthetic Aperture Radar (SAR) has been attracted attention in the fields of surveillance, reconnaissance, and national security due to its advantage of all-weather and day-and-night imaging capabilities. However, there have been some difficulties in automatically identifying targets in real situation due to various observational and environmental conditions. In this paper, ATR problems in Extended Operating Conditions (EOC) were investigated. In particular, we considered partial occlusions of the target (10% to 50%) and differences in the depression angle between training ($17^{\circ}$) and test data ($30^{\circ}$ and $45^{\circ}$). To simulate various occlusion conditions, SARBake algorithm was applied to Moving and Stationary Target Acquisition and Recognition (MSTAR) images. The ATR accuracies were evaluated by using the template matching and Adaboost algorithms. Experimental results on the depression angle showed that the target identification rate of the two algorithms decreased by more than 30% from the depression angle of $45^{\circ}$ to $30^{\circ}$. The accuracy of template matching was about 75.88% while Adaboost showed better results with an accuracy of about 86.80%. In the case of partial occlusion, the accuracy of template matching decreased significantly even in the slight occlusion (from 95.77% under no occlusion to 52.69% under 10% occlusion). The Adaboost algorithm showed better performance with an accuracy of 85.16% in no occlusion condition and 68.48% in 10% occlusion condition. Even in the 50% occlusion condition, the Adaboost provided an accuracy of 52.48%, which was much higher than the template matching (less than 30% under 50% occlusion).

A Checklist to Improve the Fairness in AI Financial Service: Focused on the AI-based Credit Scoring Service (인공지능 기반 금융서비스의 공정성 확보를 위한 체크리스트 제안: 인공지능 기반 개인신용평가를 중심으로)

  • Kim, HaYeong;Heo, JeongYun;Kwon, Hochang
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.3
    • /
    • pp.259-278
    • /
    • 2022
  • With the spread of Artificial Intelligence (AI), various AI-based services are expanding in the financial sector such as service recommendation, automated customer response, fraud detection system(FDS), credit scoring services, etc. At the same time, problems related to reliability and unexpected social controversy are also occurring due to the nature of data-based machine learning. The need Based on this background, this study aimed to contribute to improving trust in AI-based financial services by proposing a checklist to secure fairness in AI-based credit scoring services which directly affects consumers' financial life. Among the key elements of trustworthy AI like transparency, safety, accountability, and fairness, fairness was selected as the subject of the study so that everyone could enjoy the benefits of automated algorithms from the perspective of inclusive finance without social discrimination. We divided the entire fairness related operation process into three areas like data, algorithms, and user areas through literature research. For each area, we constructed four detailed considerations for evaluation resulting in 12 checklists. The relative importance and priority of the categories were evaluated through the analytic hierarchy process (AHP). We use three different groups: financial field workers, artificial intelligence field workers, and general users which represent entire financial stakeholders. According to the importance of each stakeholder, three groups were classified and analyzed, and from a practical perspective, specific checks such as feasibility verification for using learning data and non-financial information and monitoring new inflow data were identified. Moreover, financial consumers in general were found to be highly considerate of the accuracy of result analysis and bias checks. We expect this result could contribute to the design and operation of fair AI-based financial services.

Predicting Forest Gross Primary Production Using Machine Learning Algorithms (머신러닝 기법의 산림 총일차생산성 예측 모델 비교)

  • Lee, Bora;Jang, Keunchang;Kim, Eunsook;Kang, Minseok;Chun, Jung-Hwa;Lim, Jong-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.29-41
    • /
    • 2019
  • Terrestrial Gross Primary Production (GPP) is the largest global carbon flux, and forest ecosystems are important because of the ability to store much more significant amounts of carbon than other terrestrial ecosystems. There have been several attempts to estimate GPP using mechanism-based models. However, mechanism-based models including biological, chemical, and physical processes are limited due to a lack of flexibility in predicting non-stationary ecological processes, which are caused by a local and global change. Instead mechanism-free methods are strongly recommended to estimate nonlinear dynamics that occur in nature like GPP. Therefore, we used the mechanism-free machine learning techniques to estimate the daily GPP. In this study, support vector machine (SVM), random forest (RF) and artificial neural network (ANN) were used and compared with the traditional multiple linear regression model (LM). MODIS products and meteorological parameters from eddy covariance data were employed to train the machine learning and LM models from 2006 to 2013. GPP prediction models were compared with daily GPP from eddy covariance measurement in a deciduous forest in South Korea in 2014 and 2015. Statistical analysis including correlation coefficient (R), root mean square error (RMSE) and mean squared error (MSE) were used to evaluate the performance of models. In general, the models from machine-learning algorithms (R = 0.85 - 0.93, MSE = 1.00 - 2.05, p < 0.001) showed better performance than linear regression model (R = 0.82 - 0.92, MSE = 1.24 - 2.45, p < 0.001). These results provide insight into high predictability and the possibility of expansion through the use of the mechanism-free machine-learning models and remote sensing for predicting non-stationary ecological processes such as seasonal GPP.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

Analysis of Significance between SWMM Computer Simulation and Artificial Rainfall on Rainfall Runoff Delay Effects of Vegetation Unit-type LID System (식생유니트형 LID 시스템의 우수유출 지연효과에 대한 SWMM 전산모의와 인공강우 모니터링 간의 유의성 분석)

  • Kim, Tae-Han;Choi, Boo-Hun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.3
    • /
    • pp.34-44
    • /
    • 2020
  • In order to suggest performance analysis directions of ecological components based on a vegetation-based LID system model, this study seeks to analyze the statistical significance between monitoring results by using SWMM computer simulation and rainfall and run-off simulation devices and provide basic data required for a preliminary system design. Also, the study aims to comprehensively review a vegetation-based LID system's soil, a vegetation model, and analysis plans, which were less addressed in previous studies, and suggest a performance quantification direction that could act as a substitute device-type LID system. After monitoring artificial rainfall for 40 minutes, the test group zone and the control group zone recorded maximum rainfall intensity of 142.91mm/hr. (n=3, sd=0.34) and 142.24mm/hr. (n=3, sd=0.90), respectively. Compared to a hyetograph, low rainfall intensity was re-produced in 10-minute and 50-minute sections, and high rainfall intensity was confirmed in 20-minute, 30-minute, and 40-minute sections. As for rainwater run-off delay effects, run-off intensity in the test group zone was reduced by 79.8% as it recorded 0.46mm/min at the 50-minute point when the run-off intensity was highest in the control group zone. In the case of computer simulation, run-off intensity in the test group zone was reduced by 99.1% as it recorded 0.05mm/min at the 50-minute point when the run-off intensity was highest. The maximum rainfall run-off intensity in the test group zone (Dv=30.35, NSE=0.36) recorded 0.77mm/min and 1.06mm/min in artificial rainfall monitoring and SWMM computer simulation, respectively, at the 70-minute point in both cases. Likewise, the control group zone (Dv=17.27, NSE=0.78) recorded 2.26mm/min and 2.38mm/min, respectively, at the 50-minutes point. Through statistical assessing the significance between the rainfall & run-off simulating systems and the SWMM computer simulations, this study was able to suggest a preliminary design direction for the rainwater run-off reduction performance of the LID system applied with single vegetation. Also, by comprehensively examining the LID system's soil and vegetation models, and analysis methods, this study was able to compile parameter quantification plans for vegetation and soil sectors that can be aligned with a preliminary design. However, physical variables were caused by the use of a single vegetation-based LID system, and follow-up studies are required on algorithms for calibrating the statistical significance between monitoring and computer simulation results.

Comparing Prediction Uncertainty Analysis Techniques of SWAT Simulated Streamflow Applied to Chungju Dam Watershed (충주댐 유역의 유출량에 대한 SWAT 모형의 예측 불확실성 분석 기법 비교)

  • Joh, Hyung-Kyung;Park, Jong-Yoon;Jang, Cheol-Hee;Kim, Seong-Joon
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.861-874
    • /
    • 2012
  • To fulfill applicability of Soil and Water Assessment Tool (SWAT) model, it is important that this model passes through a careful calibration and uncertainty analysis. In recent years, many researchers have come up with various uncertainty analysis techniques for SWAT model. To determine the differences and similarities of typical techniques, we applied three uncertainty analysis procedures to Chungju Dam watershed (6,581.1 $km^2$) of South Korea included in SWAT-Calibration Uncertainty Program (SWAT-CUP): Sequential Uncertainty FItting algorithm ver.2 (SUFI2), Generalized Likelihood Uncertainty Estimation (GLUE), Parameter Solution (ParaSol). As a result, there was no significant difference in the objective function values between SUFI2 and GLUE algorithms. However, ParaSol algorithm shows the worst objective functions, and considerable divergence was also showed in 95PPU bands with each other. The p-factor and r-factor appeared from 0.02 to 0.79 and 0.03 to 0.52 differences in streamflow respectively. In general, the ParaSol algorithm showed the lowest p-factor and r-factor, SUFI2 algorithm was the highest in the p-factor and r-factor. Therefore, in the SWAT model calibration and uncertainty analysis of the automatic methods, we suggest the calibration methods considering p-factor and r-factor. The p-factor means the percentage of observations covered by 95PPU (95 Percent Prediction Uncertainty) band, and r-factor is the average thickness of the 95PPU band.

Enhancing Predictive Accuracy of Collaborative Filtering Algorithms using the Network Analysis of Trust Relationship among Users (사용자 간 신뢰관계 네트워크 분석을 활용한 협업 필터링 알고리즘의 예측 정확도 개선)

  • Choi, Seulbi;Kwahk, Kee-Young;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.113-127
    • /
    • 2016
  • Among the techniques for recommendation, collaborative filtering (CF) is commonly recognized to be the most effective for implementing recommender systems. Until now, CF has been popularly studied and adopted in both academic and real-world applications. The basic idea of CF is to create recommendation results by finding correlations between users of a recommendation system. CF system compares users based on how similar they are, and recommend products to users by using other like-minded people's results of evaluation for each product. Thus, it is very important to compute evaluation similarities among users in CF because the recommendation quality depends on it. Typical CF uses user's explicit numeric ratings of items (i.e. quantitative information) when computing the similarities among users in CF. In other words, user's numeric ratings have been a sole source of user preference information in traditional CF. However, user ratings are unable to fully reflect user's actual preferences from time to time. According to several studies, users may more actively accommodate recommendation of reliable others when purchasing goods. Thus, trust relationship can be regarded as the informative source for identifying user's preference with accuracy. Under this background, we propose a new hybrid recommender system that fuses CF and social network analysis (SNA). The proposed system adopts the recommendation algorithm that additionally reflect the result analyzed by SNA. In detail, our proposed system is based on conventional memory-based CF, but it is designed to use both user's numeric ratings and trust relationship information between users when calculating user similarities. For this, our system creates and uses not only user-item rating matrix, but also user-to-user trust network. As the methods for calculating user similarity between users, we proposed two alternatives - one is algorithm calculating the degree of similarity between users by utilizing in-degree and out-degree centrality, which are the indices representing the central location in the social network. We named these approaches as 'Trust CF - All' and 'Trust CF - Conditional'. The other alternative is the algorithm reflecting a neighbor's score higher when a target user trusts the neighbor directly or indirectly. The direct or indirect trust relationship can be identified by searching trust network of users. In this study, we call this approach 'Trust CF - Search'. To validate the applicability of the proposed system, we used experimental data provided by LibRec that crawled from the entire FilmTrust website. It consists of ratings of movies and trust relationship network indicating who to trust between users. The experimental system was implemented using Microsoft Visual Basic for Applications (VBA) and UCINET 6. To examine the effectiveness of the proposed system, we compared the performance of our proposed method with one of conventional CF system. The performances of recommender system were evaluated by using average MAE (mean absolute error). The analysis results confirmed that in case of applying without conditions the in-degree centrality index of trusted network of users(i.e. Trust CF - All), the accuracy (MAE = 0.565134) was lower than conventional CF (MAE = 0.564966). And, in case of applying the in-degree centrality index only to the users with the out-degree centrality above a certain threshold value(i.e. Trust CF - Conditional), the proposed system improved the accuracy a little (MAE = 0.564909) compared to traditional CF. However, the algorithm searching based on the trusted network of users (i.e. Trust CF - Search) was found to show the best performance (MAE = 0.564846). And the result from paired samples t-test presented that Trust CF - Search outperformed conventional CF with 10% statistical significance level. Our study sheds a light on the application of user's trust relationship network information for facilitating electronic commerce by recommending proper items to users.