• Title/Summary/Keyword: Data optimization

Search Result 3,516, Processing Time 0.031 seconds

Development of a Stock Trading System Using M & W Wave Patterns and Genetic Algorithms (M&W 파동 패턴과 유전자 알고리즘을 이용한 주식 매매 시스템 개발)

  • Yang, Hoonseok;Kim, Sunwoong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.63-83
    • /
    • 2019
  • Investors prefer to look for trading points based on the graph shown in the chart rather than complex analysis, such as corporate intrinsic value analysis and technical auxiliary index analysis. However, the pattern analysis technique is difficult and computerized less than the needs of users. In recent years, there have been many cases of studying stock price patterns using various machine learning techniques including neural networks in the field of artificial intelligence(AI). In particular, the development of IT technology has made it easier to analyze a huge number of chart data to find patterns that can predict stock prices. Although short-term forecasting power of prices has increased in terms of performance so far, long-term forecasting power is limited and is used in short-term trading rather than long-term investment. Other studies have focused on mechanically and accurately identifying patterns that were not recognized by past technology, but it can be vulnerable in practical areas because it is a separate matter whether the patterns found are suitable for trading. When they find a meaningful pattern, they find a point that matches the pattern. They then measure their performance after n days, assuming that they have bought at that point in time. Since this approach is to calculate virtual revenues, there can be many disparities with reality. The existing research method tries to find a pattern with stock price prediction power, but this study proposes to define the patterns first and to trade when the pattern with high success probability appears. The M & W wave pattern published by Merrill(1980) is simple because we can distinguish it by five turning points. Despite the report that some patterns have price predictability, there were no performance reports used in the actual market. The simplicity of a pattern consisting of five turning points has the advantage of reducing the cost of increasing pattern recognition accuracy. In this study, 16 patterns of up conversion and 16 patterns of down conversion are reclassified into ten groups so that they can be easily implemented by the system. Only one pattern with high success rate per group is selected for trading. Patterns that had a high probability of success in the past are likely to succeed in the future. So we trade when such a pattern occurs. It is a real situation because it is measured assuming that both the buy and sell have been executed. We tested three ways to calculate the turning point. The first method, the minimum change rate zig-zag method, removes price movements below a certain percentage and calculates the vertex. In the second method, high-low line zig-zag, the high price that meets the n-day high price line is calculated at the peak price, and the low price that meets the n-day low price line is calculated at the valley price. In the third method, the swing wave method, the high price in the center higher than n high prices on the left and right is calculated as the peak price. If the central low price is lower than the n low price on the left and right, it is calculated as valley price. The swing wave method was superior to the other methods in the test results. It is interpreted that the transaction after checking the completion of the pattern is more effective than the transaction in the unfinished state of the pattern. Genetic algorithms(GA) were the most suitable solution, although it was virtually impossible to find patterns with high success rates because the number of cases was too large in this simulation. We also performed the simulation using the Walk-forward Analysis(WFA) method, which tests the test section and the application section separately. So we were able to respond appropriately to market changes. In this study, we optimize the stock portfolio because there is a risk of over-optimized if we implement the variable optimality for each individual stock. Therefore, we selected the number of constituent stocks as 20 to increase the effect of diversified investment while avoiding optimization. We tested the KOSPI market by dividing it into six categories. In the results, the portfolio of small cap stock was the most successful and the high vol stock portfolio was the second best. This shows that patterns need to have some price volatility in order for patterns to be shaped, but volatility is not the best.

Evaluation of Dose Distributions Recalculated with Per-field Measurement Data under the Condition of Respiratory Motion during IMRT for Liver Cancer (간암 환자의 세기조절방사선치료 시 호흡에 의한 움직임 조건에서 측정된 조사면 별 선량결과를 기반으로 재계산한 체내 선량분포 평가)

  • Song, Ju-Young;Kim, Yong-Hyeob;Jeong, Jae-Uk;Yoon, Mee Sun;Ahn, Sung-Ja;Chung, Woong-Ki;Nam, Taek-Keun
    • Progress in Medical Physics
    • /
    • v.25 no.2
    • /
    • pp.79-88
    • /
    • 2014
  • The dose distributions within the real volumes of tumor targets and critical organs during internal target volume-based intensity-modulated radiation therapy (ITV-IMRT) for liver cancer were recalculated by applying the effects of actual respiratory organ motion, and the dosimetric features were analyzed through comparison with gating IMRT (Gate-IMRT) plan results. The ITV was created using MIM software, and a moving phantom was used to simulate respiratory motion. The doses were recalculated with a 3 dose-volume histogram (3DVH) program based on the per-field data measured with a MapCHECK2 2-dimensional diode detector array. Although a sufficient prescription dose covered the PTV during ITV-IMRT delivery, the dose homogeneity in the PTV was inferior to that with the Gate-IMRT plan. We confirmed that there were higher doses to the organs-at-risk (OARs) with ITV-IMRT, as expected when using an enlarged field, but the increased dose to the spinal cord was not significant and the increased doses to the liver and kidney could be considered as minor when the reinforced constraints were applied during IMRT plan optimization. Because the Gate-IMRT method also has disadvantages such as unsuspected dosimetric variations when applying the gating system and an increased treatment time, it is better to perform a prior analysis of the patient's respiratory condition and the importance and fulfillment of the IMRT plan dose constraints in order to select an optimal IMRT method with which to correct the respiratory organ motional effect.

Social Network-based Hybrid Collaborative Filtering using Genetic Algorithms (유전자 알고리즘을 활용한 소셜네트워크 기반 하이브리드 협업필터링)

  • Noh, Heeryong;Choi, Seulbi;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.19-38
    • /
    • 2017
  • Collaborative filtering (CF) algorithm has been popularly used for implementing recommender systems. Until now, there have been many prior studies to improve the accuracy of CF. Among them, some recent studies adopt 'hybrid recommendation approach', which enhances the performance of conventional CF by using additional information. In this research, we propose a new hybrid recommender system which fuses CF and the results from the social network analysis on trust and distrust relationship networks among users to enhance prediction accuracy. The proposed algorithm of our study is based on memory-based CF. But, when calculating the similarity between users in CF, our proposed algorithm considers not only the correlation of the users' numeric rating patterns, but also the users' in-degree centrality values derived from trust and distrust relationship networks. In specific, it is designed to amplify the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the trust relationship network. Also, it attenuates the similarity between a target user and his or her neighbor when the neighbor has higher in-degree centrality in the distrust relationship network. Our proposed algorithm considers four (4) types of user relationships - direct trust, indirect trust, direct distrust, and indirect distrust - in total. And, it uses four adjusting coefficients, which adjusts the level of amplification / attenuation for in-degree centrality values derived from direct / indirect trust and distrust relationship networks. To determine optimal adjusting coefficients, genetic algorithms (GA) has been adopted. Under this background, we named our proposed algorithm as SNACF-GA (Social Network Analysis - based CF using GA). To validate the performance of the SNACF-GA, we used a real-world data set which is called 'Extended Epinions dataset' provided by 'trustlet.org'. It is the data set contains user responses (rating scores and reviews) after purchasing specific items (e.g. car, movie, music, book) as well as trust / distrust relationship information indicating whom to trust or distrust between users. The experimental system was basically developed using Microsoft Visual Basic for Applications (VBA), but we also used UCINET 6 for calculating the in-degree centrality of trust / distrust relationship networks. In addition, we used Palisade Software's Evolver, which is a commercial software implements genetic algorithm. To examine the effectiveness of our proposed system more precisely, we adopted two comparison models. The first comparison model is conventional CF. It only uses users' explicit numeric ratings when calculating the similarities between users. That is, it does not consider trust / distrust relationship between users at all. The second comparison model is SNACF (Social Network Analysis - based CF). SNACF differs from the proposed algorithm SNACF-GA in that it considers only direct trust / distrust relationships. It also does not use GA optimization. The performances of the proposed algorithm and comparison models were evaluated by using average MAE (mean absolute error). Experimental result showed that the optimal adjusting coefficients for direct trust, indirect trust, direct distrust, indirect distrust were 0, 1.4287, 1.5, 0.4615 each. This implies that distrust relationships between users are more important than trust ones in recommender systems. From the perspective of recommendation accuracy, SNACF-GA (Avg. MAE = 0.111943), the proposed algorithm which reflects both direct and indirect trust / distrust relationships information, was found to greatly outperform a conventional CF (Avg. MAE = 0.112638). Also, the algorithm showed better recommendation accuracy than the SNACF (Avg. MAE = 0.112209). To confirm whether these differences are statistically significant or not, we applied paired samples t-test. The results from the paired samples t-test presented that the difference between SNACF-GA and conventional CF was statistical significant at the 1% significance level, and the difference between SNACF-GA and SNACF was statistical significant at the 5%. Our study found that the trust/distrust relationship can be important information for improving performance of recommendation algorithms. Especially, distrust relationship information was found to have a greater impact on the performance improvement of CF. This implies that we need to have more attention on distrust (negative) relationships rather than trust (positive) ones when tracking and managing social relationships between users.

Recent Progress in Air-Conditioning and Refrigeration Research : A Review of Papers Published in the Korean Journal of Air-Conditioning and Refrigeration Engineering in 2016 (설비공학 분야의 최근 연구 동향 : 2016년 학회지 논문에 대한 종합적 고찰)

  • Lee, Dae-Young;Kim, Sa Ryang;Kim, Hyun-Jung;Kim, Dong-Seon;Park, Jun-Seok;Ihm, Pyeong Chan
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.29 no.6
    • /
    • pp.327-340
    • /
    • 2017
  • This article reviews the papers published in the Korean Journal of Air-Conditioning and Refrigeration Engineering during 2016. It is intended to understand the status of current research in the areas of heating, cooling, ventilation, sanitation, and indoor environments of buildings and plant facilities. Conclusions are as follows. (1) The research works on the thermal and fluid engineering have been reviewed as groups of flow, heat and mass transfer, the reduction of pollutant exhaust gas, cooling and heating, the renewable energy system and the flow around buildings. CFD schemes were used more for all research areas. (2) Research works on heat transfer area have been reviewed in the categories of heat transfer characteristics, pool boiling and condensing heat transfer and industrial heat exchangers. Researches on heat transfer characteristics included the results of the long-term performance variation of the plate-type enthalpy exchange element made of paper, design optimization of an extruded-type cooling structure for reducing the weight of LED street lights, and hot plate welding of thermoplastic elastomer packing. In the area of pool boiling and condensing, the heat transfer characteristics of a finned-tube heat exchanger in a PCM (phase change material) thermal energy storage system, influence of flow boiling heat transfer on fouling phenomenon in nanofluids, and PCM at the simultaneous charging and discharging condition were studied. In the area of industrial heat exchangers, one-dimensional flow network model and porous-media model, and R245fa in a plate-shell heat exchanger were studied. (3) Various studies were published in the categories of refrigeration cycle, alternative refrigeration/energy system, system control. In the refrigeration cycle category, subjects include mobile cold storage heat exchanger, compressor reliability, indirect refrigeration system with $CO_2$ as secondary fluid, heat pump for fuel-cell vehicle, heat recovery from hybrid drier and heat exchangers with two-port and flat tubes. In the alternative refrigeration/energy system category, subjects include membrane module for dehumidification refrigeration, desiccant-assisted low-temperature drying, regenerative evaporative cooler and ejector-assisted multi-stage evaporation. In the system control category, subjects include multi-refrigeration system control, emergency cooling of data center and variable-speed compressor control. (4) In building mechanical system research fields, fifteenth studies were reported for achieving effective design of the mechanical systems, and also for maximizing the energy efficiency of buildings. The topics of the studies included energy performance, HVAC system, ventilation, renewable energies, etc. Proposed designs, performance tests using numerical methods and experiments provide useful information and key data which could be help for improving the energy efficiency of the buildings. (5) The field of architectural environment was mostly focused on indoor environment and building energy. The main researches of indoor environment were related to the analyses of indoor thermal environments controlled by portable cooler, the effects of outdoor wind pressure in airflow at high-rise buildings, window air tightness related to the filling piece shapes, stack effect in core type's office building and the development of a movable drawer-type light shelf with adjustable depth of the reflector. The subjects of building energy were worked on the energy consumption analysis in office building, the prediction of exit air temperature of horizontal geothermal heat exchanger, LS-SVM based modeling of hot water supply load for district heating system, the energy saving effect of ERV system using night purge control method and the effect of strengthened insulation level to the building heating and cooling load.

Application of The Semi-Distributed Hydrological Model(TOPMODEL) for Prediction of Discharge at the Deciduous and Coniferous Forest Catchments in Gwangneung, Gyeonggi-do, Republic of Korea (경기도(京畿道) 광릉(光陵)의 활엽수림(闊葉樹林)과 침엽수림(針葉樹林) 유역(流域)의 유출량(流出量) 산정(算定)을 위한 준분포형(準分布型) 수문모형(水文模型)(TOPMODEL)의 적용(適用))

  • Kim, Kyongha;Jeong, Yongho;Park, Jaehyeon
    • Journal of Korean Society of Forest Science
    • /
    • v.90 no.2
    • /
    • pp.197-209
    • /
    • 2001
  • TOPMODEL, semi-distributed hydrological model, is frequently applied to predict the amount of discharge, main flow pathways and water quality in a forested catchment, especially in a spatial dimension. TOPMODEL is a kind of conceptual model, not physical one. The main concept of TOPMODEL is constituted by the topographic index and soil transmissivity. Two components can be used for predicting the surface and subsurface contributing area. This study is conducted for the validation of applicability of TOPMODEL at small forested catchments in Korea. The experimental area is located at Gwangneung forest operated by Korea Forest Research Institute, Gyeonggi-do near Seoul metropolitan. Two study catchments in this area have been working since 1979 ; one is the natural mature deciduous forest(22.0 ha) about 80 years old and the other is the planted young coniferous forest(13.6 ha) about 22 years old. The data collected during the two events in July 1995 and June 2000 at the mature deciduous forest and the three events in July 1995 and 1999, August 2000 at the young coniferous forest were used as the observed data set, respectively. The topographic index was calculated using $10m{\times}10m$ resolution raster digital elevation map(DEM). The distribution of the topographic index ranged from 2.6 to 11.1 at the deciduous and 2.7 to 16.0 at the coniferous catchment. The result of the optimization using the forecasting efficiency as the objective function showed that the model parameter, m and the mean catchment value of surface saturated transmissivity, $lnT_0$ had a high sensitivity. The values of the optimized parameters for m and InT_0 were 0.034 and 0.038; 8.672 and 9.475 at the deciduous and 0.031, 0.032 and 0.033; 5.969, 7.129 and 7.575 at the coniferous catchment, respectively. The forecasting efficiencies resulted from the simulation using the optimized parameter were comparatively high ; 0.958 and 0.909 at the deciduous and 0.825, 0.922 and 0.961 at the coniferous catchment. The observed and simulated hyeto-hydrograph shoed that the time of lag to peak coincided well. Though the total runoff and peakflow of some events showed a discrepancy between the observed and simulated output, TOPMODEL could overall predict a hydrologic output at the estimation error less than 10 %. Therefore, TOPMODEL is useful tool for the prediction of runoff at an ungaged forested catchment in Korea.

  • PDF

Application of LCA Methodology on Lettuce Cropping Systems in Protected Cultivation (시설재배 상추에 대한 전과정평가 (LCA) 방법론 적용)

  • Ryu, Jong-Hee;Kim, Kye-Hoon
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.43 no.5
    • /
    • pp.705-715
    • /
    • 2010
  • The adoption of carbon foot print system is being activated mostly in the developed countries as one of the long-term response towards tightened up regulations and standards on carbon emission in the agricultural sector. The Korean Ministry of Environment excluded the primary agricultural products from the carbon foot print system due to lack of LCI (life cycle inventory) database in agriculture. Therefore, the research on and establishment of LCI database in the agriculture for adoption of carbon foot print system is urgent. Development of LCA (life cycle assessment) methodology for application of LCA to agricultural environment in Korea is also very important. Application of LCA methodology to agricultural environment in Korea is an early stage. Therefore, this study was carried out to find out the effect of lettuce cultivation on agricultural environment by establishing LCA methodology. Data collection of agricultural input and output for establishing LCI was carried out by collecting statistical data and documents on income from agro and livestock products prepared by RDA. LCA methodology for agriculture was reviewed by investigating LCA methodology and LCA applications of foreign countries. Results based on 1 kg of lettuce production showed that inputs including N, P, organic fertilizers, compound fertilizers and crop protectants were the main sources of major emission factor during lettuce cropping process. The amount of inputs considering the amount of active ingredients was required to estimate the actual quantity of the inputs used. Major emissions due to agricultural activities were $N_2O$ (emission to air) and ${NO_3}^-$/${PO_4}^-$ (emission to water) from fertilizers, organic compounds from pesticides and air pollutants from fossil fuel combustion in using agricultural machines. The softwares for LCIA (life cycle impact assessment) and LCA used in Korea are 'PASS' and 'TOTAL' which have been developed by the Ministry of Knowledge Economy and the Ministry of Environment. However, the models used for the softwares are the ones developed in foreign countries. In the future, development of models and optimization of factors for characterization, normalization and weighting suitable to Korean agricultural environment need to be done for more precise LCA analysis in the agricultural area.

Development of Neural Network Based Cycle Length Design Model Minimizing Delay for Traffic Responsive Control (실시간 신호제어를 위한 신경망 적용 지체최소화 주기길이 설계모형 개발)

  • Lee, Jung-Youn;Kim, Jin-Tae;Chang, Myung-Soon
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.3 s.74
    • /
    • pp.145-157
    • /
    • 2004
  • The cycle length design model of the Korean traffic responsive signal control systems is devised to vary a cycle length as a response to changes in traffic demand in real time by utilizing parameters specified by a system operator and such field information as degrees of saturation of through phases. Since no explicit guideline is provided to a system operator, the system tends to include ambiguity in terms of the system optimization. In addition, the cycle lengths produced by the existing model have yet been verified if they are comparable to the ones minimizing delay. This paper presents the studies conducted (1) to find shortcomings embedded in the existing model by comparing the cycle lengths produced by the model against the ones minimizing delay and (2) to propose a new direction to design a cycle length minimizing delay and excluding such operator oriented parameters. It was found from the study that the cycle lengths from the existing model fail to minimize delay and promote intersection operational conditions to be unsatisfied when traffic volume is low, due to the feature of the changed target operational volume-to-capacity ratio embedded in the model. The 64 different neural network based cycle length design models were developed based on simulation data surrogating field data. The CORSIM optimal cycle lengths minimizing delay were found through the COST software developed for the study. COST searches for the CORSIM optimal cycle length minimizing delay with a heuristic searching method, a hybrid genetic algorithm. Among 64 models, the best one producing cycle lengths close enough to the optimal was selected through statistical tests. It was found from the verification test that the best model designs a cycle length as similar pattern to the ones minimizing delay. The cycle lengths from the proposed model are comparable to the ones from TRANSYT-7F.

Optimization of the cryopreserved condition for utilization of GPCR frozen cells (GPCR 냉동보관 세포의 활용을 위한 냉동조건의 최적화 연구)

  • Noh, Hyojin;Lee, Sunghou
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.2
    • /
    • pp.1200-1206
    • /
    • 2015
  • The major target for drug discovery, G-protein coupled receptor (GPCR) is involved in many physiological activities and related to various diseases and disorders. Among experimental techniques relating to the GPCR drug discovery process, various cell-based screening methods are influenced by cell conditions used in the overall process. Recently, the utilization of frozen cells is suggested in terms of reducing data variation and cost-effectiveness. The aim of this study is to evaluate various conditions in cell freezing such as temperature conditions and storage terms. The stable cell lines for calcium sensing receptor and urotensin receptor were established followed by storing cultured cells at $-80^{\circ}C$ up to 4 weeks. To compare with cell stored at liquid nitrogen, agonist and antagonist responses were recorded based on the luminescence detection by the calcium induced photoprotein activation. Cell signals were reduced as the storage period was increased without the changes in $EC_{50}$ and $IC_{50}$ values $EC_{50}:3.46{\pm}1.36mM$, $IC_{50}:0.49{\pm}0.15{\mu}M$). In case of cells stored in liquid nitrogen, cell responses were decreased comparing to those in live cells, however changes by storage periods and significant variations of $EC_{50}/IC_{50}$ values were not detected. The decrease of cell signals in various frozen cells may be due to the increase of cell damages. From these results, the best way for a long-term cryopreservation is the use of liquid nitrogen condition, and for the purpose of short-term storage within a month, $-80^{\circ}C$ storage condition can be possible to adopt. As a conclusion, the active implementation of frozen cells may contribute to decrease variations of experimental data during the initial cell-based screening process.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

Kriging of Daily PM10 Concentration from the Air Korea Stations Nationwide and the Accuracy Assessment (베리오그램 최적화 기반의 정규크리깅을 이용한 전국 에어코리아 PM10 자료의 일평균 격자지도화 및 내삽정확도 검증)

  • Jeong, Yemin;Cho, Subin;Youn, Youjeong;Kim, Seoyeon;Kim, Geunah;Kang, Jonggu;Lee, Dalgeun;Chung, Euk;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.379-394
    • /
    • 2021
  • Air pollution data in South Korea is provided on a real-time basis by Air Korea stations since 2005. Previous studies have shown the feasibility of gridding air pollution data, but they were confined to a few cities. This paper examines the creation of nationwide gridded maps for PM10 concentration using 333 Air Korea stations with variogram optimization and ordinary kriging. The accuracy of the spatial interpolation was evaluated by various sampling schemes to avoid a too dense or too sparse distribution of the validation points. Using the 114,745 matchups, a four-round blind test was conducted by extracting random validation points for every 365 days in 2019. The overall accuracy was stably high with the MAE of 5.697 ㎍/m3 and the CC of 0.947. Approximately 1,500 cases for high PM10 concentration also showed a result with the MAE of about 12 ㎍/m3 and the CC over 0.87, which means that the proposed method was effective and applicable to various situations. The gridded maps for daily PM10 concentration at the resolution of 0.05° also showed a reasonable spatial distribution, which can be used as an input variable for a gridded prediction of tomorrow's PM10 concentration.