• Title/Summary/Keyword: linear prediction

Search Result 1,983, Processing Time 0.027 seconds

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.

Development of Prediction Model for the Na Content of Leaves of Spring Potatoes Using Hyperspectral Imagery (초분광 영상을 이용한 봄감자의 잎 Na 함량 예측 모델 개발)

  • Park, Jun-Woo;Kang, Ye-Seong;Ryu, Chan-Seok;Jang, Si-Hyeong;Kang, Kyung-Suk;Kim, Tae-Yang;Park, Min-Jun;Baek, Hyeon-Chan;Song, Hye-Young;Jun, Sae-Rom;Lee, Su-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.316-328
    • /
    • 2021
  • In this study, the leaf Na content prediction model for spring potato was established using 400-1000 nm hyperspectral sensor to develop the multispectral sensor for the salinity monitoring in reclaimed land. The irrigation conditions were standard, drought, and salinity (2, 4, 8 dS/m), and the irrigation amount was calculated based on the amount of evaporation. The leaves' Na contents were measured 1st and 2nd weeks after starting irrigation in the vegetative, tuber formative, and tuber growing periods, respectively. The reflectance of the leaves was converted from 5 nm to 10 nm, 25 nm, and 50 nm of FWHM (full width at half maximum) based on the 10 nm wavelength intervals. Using the variance importance in projections of partial least square regression(PLSR-VIP), ten band ratios were selected as the variables to predict salinity damage levels with Na content of spring potato leaves. The MLR(Multiple linear regression) models were estimated by removing the band ratios one by one in the order of the lowest weight among the ten band ratios. The performance of models was compared by not only R2, MAPE but also the number of band ratios, optimal FWHM to develop the compact multispectral sensor. It was an advantage to use 25 nm of FWHM to predict the amount of Na in leaves for spring potatoes during the 1st and 2nd weeks vegetative and tuber formative periods and 2 weeks tuber growing periods. The selected bandpass filters were 15 bands and mainly in red and red-edge regions such as 430/440, 490/500, 500/510, 550/560, 570/580, 590/600, 640/650, 650/660, 670/680, 680/690, 690/700, 700/710, 710/720, 720/730, 730/740 nm.

Application of deep learning method for decision making support of dam release operation (댐 방류 의사결정지원을 위한 딥러닝 기법의 적용성 평가)

  • Jung, Sungho;Le, Xuan Hien;Kim, Yeonsu;Choi, Hyungu;Lee, Giha
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1095-1105
    • /
    • 2021
  • The advancement of dam operation is further required due to the upcoming rainy season, typhoons, or torrential rains. Besides, physical models based on specific rules may sometimes have limitations in controlling the release discharge of dam due to inherent uncertainty and complex factors. This study aims to forecast the water level of the nearest station to the dam multi-timestep-ahead and evaluate the availability when it makes a decision for a release discharge of dam based on LSTM (Long Short-Term Memory) of deep learning. The LSTM model was trained and tested on eight data sets with a 1-hour temporal resolution, including primary data used in the dam operation and downstream water level station data about 13 years (2009~2021). The trained model forecasted the water level time series divided by the six lead times: 1, 3, 6, 9, 12, 18-hours, and compared and analyzed with the observed data. As a result, the prediction results of the 1-hour ahead exhibited the best performance for all cases with an average accuracy of MAE of 0.01m, RMSE of 0.015 m, and NSE of 0.99, respectively. In addition, as the lead time increases, the predictive performance of the model tends to decrease slightly. The model may similarly estimate and reliably predicts the temporal pattern of the observed water level. Thus, it is judged that the LSTM model could produce predictive data by extracting the characteristics of complex hydrological non-linear data and can be used to determine the amount of release discharge from the dam when simulating the operation of the dam.

Development of Unfolding Energy Spectrum with Clinical Linear Accelerator based on Transmission Data (물질투과율 측정정보 기반 의료용 선형가속기의 에너지스펙트럼 유도기술 개발)

  • Choi, Hyun Joon;Park, Hyo Jun;Yoo, Do Hyeon;Kim, Byoung-Chul;Yi, Chul-Young;Min, Chul Hee
    • Journal of Radiation Protection and Research
    • /
    • v.41 no.1
    • /
    • pp.41-47
    • /
    • 2016
  • Background: For the accurate dose assessment in radiation therapy, energy spectrum of the photon beam generated from the linac head is essential. The aim of this study is to develop the technique to accurately unfolding the energy spectrum with the transmission analysis method. Materials and Methods: Clinical linear accelerator and Monet Carlo method was employed to evaluate the transmission signals according to the thickness of the observer material, and then the response function of the ion chamber response was determined with the mono energy beam. Finally the energy spectrum was unfolded with HEPROW program. Elekta Synergy Flatform and Geant4 tool kits was used in this study. Results and Discussion: In the comparison between calculated and measured transmission signals using aluminum alloy as an attenuator, root mean squared error was 0.43%. In the comparison between unfolded spectrum using HEPROW program and calculated spectrum using Geant4, the difference of peak and mean energy were 0.066 and 0.03 MeV, respectively. However, for the accurate prediction of the energy spectrum, additional experiment with various type of material and improvement of the unfolding program is required. Conclusion: In this research, it is demonstrated that unfolding spectra technique could be used in megavoltage photon beam with aluminum alloy and HEPROW program.

Estimation of fire Experiment Prediction by Utility Tunnels Fire Experiment and Simulation (지하공동구 화재 실험 및 시뮬레이션에 의한 화재 설칠 예측 평가)

  • 윤명오;고재선;박형주;박성은
    • Fire Science and Engineering
    • /
    • v.15 no.1
    • /
    • pp.23-33
    • /
    • 2001
  • The utility tunnels are the important facility as a mainstay of country because of the latest communication developments. However, the utilities tunnel is difficult to deal with in case of a fire accident. When a cable burns, the black smoke containing poisonous gas will be reduced. This black smoke goes into the tunnel, and makes it difficult to extinguish the fire. Therefore, when there was a fire in the utility tunnel, the central nerves of the country had been paralyzed, such as property damage, communication interruption, in addition to inconvenience for people. This paper is based on the fire occurred in the past, and reenacting the fire by making the real utilities tunnel model. The aim of this paper is the scientific analysis of the character image of the fire, and the verification of each fire protection system whether it works well after process of setting up a fire protection system in the utilities tunnel at a constant temperature. The fire experiment was equipped with the linear heat detector, the fire door, the connection water spray system and the ventilation system in the utilities tunnel. Fixed portion of an electric power supply cable was coated with a fire retardant coating, and a heating tube was covered with a fireproof. The result showed that the highest temperature was $932^{\circ}c$ and the linear heat detector was working at the constant temperature, and it pointed at the place of the fire on the receiving board, and Fixed portion of the electric power supply cable coated with the fire retardant coating did not work as the fireproof. The heating tube was covered with the fireproof about 30 minutes.

  • PDF

Estimation of Shoot Development for a Single-stemmed Rose 'Vital' Based on Thermal Units in a Plant Factory System (식물공장 시스템에서 Thermal Units을 이용한 Single-Stemmed Rose 'Vital'의 신초발달 예측)

  • Yeo, Kyung-Hwan;Cho, Young-Yeol;Lee, Yong-Beom
    • Horticultural Science & Technology
    • /
    • v.28 no.5
    • /
    • pp.768-776
    • /
    • 2010
  • This study was conducted to predict number and fresh weight of leaves, and total leaf area of a single-stemmed rose 'Vital' based on the accumulated thermal units, and to develop a model of shoot development for the prediction of the time when the flowering shoot reaches a phenological stage in a plant factory system. The base temperature ($T_b$), optimum temperature ($T_{opt}$), and maximum temperature ($T_{max}$) were estimated by regressing the rate of shoot development against the temperature gradient. The rate of shoot development ($R$, $d^{-1}$) for the phase from cutting to bud break (CT-BB) was best described by a linear model $R_b$ ($d^{-1}$) = -0.0089 + $0.0016{\cdot}temp$. The rate of shoot development for the phase from bud break to harvest (BB-HV) was fitted to the parabolic model $R_h$ ($d^{-1}$) = $-0.0001{\cdot}temp^2$ + $0.0054{\cdot}temp$ - 0.0484. The $T_b$, $T_{opt}$, and $T_{max}$ values were 5.56, 27.0, and $42.7^{\circ}C$, respectively. The $T_b$ value was used in the thermal unit computations for the shoot development. Number of leaves, leaf area (LA), and leaf fresh weight showed sigmoidal curves regardless of the cut time. The shoot development and leaf area model was described as a sigmoidal function using thermal units. Leaf area was described as LA = 578.7 $[1+(thermal units/956.1)^{-8.54}]^{-1}$. Estimated and observed shoot length and leaf fresh weight showed a reasonably good fit with 1.060 ($R^2=0.976^{***}$) and 1.043 ($R^2=0.955^{***}$), respectively. The average thermal units required from cutting to transplant and from transplant to harvest stages were $426{\pm}42^{\circ}C{\cdot}d$ and $783{\pm}24^{\circ}C{\cdot}d$, respectively.

STUDY ON THE GROWTH OF THE MANDIBLE USING WIDE OPEN LATERAL CEPHALOGRAM (Wide open lateral cephalogram을 이용한 하악골 성장에 관한 연구)

  • Moon, Sung-Uk;Park, Young-Guk;Chung, Kyu-Rhim
    • The korean journal of orthodontics
    • /
    • v.31 no.1 s.84
    • /
    • pp.39-50
    • /
    • 2001
  • In proceeding with orthodontic treatment, the prediction for the shape, growth rate and growth direction of mandible plays a major role to set up the treatment plan and determine its period and prognosis. Various approaches being made so far have shown that the linear and angular measurement using lateral cephalograms are relatively accurate to estimate them. This study was purposed to find the shape of mandible more clearly by preventing the overlap of the Condyle head area which appears in lateral cephalogram, and to estimate its growth rate by comparing the growth quantity and ratio via lateral area measurement. This experimental was performed against 40 patients total, of which Class I of 14, Class II of 9 and Class III of 17 consist. Wide open lateral cephalograms of 40 patients were taken over average period of 4 Year 3 Months, then the linear and angular measurements were carried out with 11 itemized lists. Autocad Rl4 application program was utilized to draw their appearance, measure and compare their lateral area. As a result of study, conclusions were made as follows; 1. Mandibular body length (gonion-menton) tended to increase in order of CIII, CI and CII, and Mandibular body length of CIII group had a tendency to grow twice faster than that of CII group. 2. In lateral items such as Go-Me, A-Cd, B-Cp, E-F and G-H, CIII showed a significant increase on the year-average quantity and rate of the growth, and especially apparent difference was observed in CIII group rather than CII group. 3. For the 4 Year 3 Months period, the year-average growth quantity of lateral area of the mandible was $1.0cm^2$ for Class I, $0.8cm^2$ for Class II and $1.4cm^2$ for Class III, which corresponds to $11.9\%,\;11.8\%\;and\;20.3\%$ of growth ratio respectively. Thus, growth ratio almost 2 times more than other groups was observed in group CIII while growth ratio between group CI and CII has little difference. 4. Considering the results as above, it can be proposed that the difference in size of the mandible between groups is caused by the difference in the growth rate and growth quantity of the mandible, which generated in the middle of growth, rather than the difference in size of congenital Jaw-bone.

  • PDF

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Selection of Optimal Models for Predicting the Distribution of Invasive Alien Plants Species (IAPS) in Forest Genetic Resource Reserves (산림생태계 보호구역에서 외래식물 분포 예측을 위한 최적 모형의 선발)

  • Lim, Chi-hong;Jung, Song-hie;Jung, Su-young;Kim, Nam-shin;Cho, Yong-chan
    • Korean Journal of Environment and Ecology
    • /
    • v.34 no.6
    • /
    • pp.589-600
    • /
    • 2020
  • Effective conservation and management of protected areas require monitoring the settlement of invasive alien species and reducing their dispersion capacity. We simulated the potential distribution of invasive alien plant species (IAPS) using three representative species distribution models (Bioclim, GLM, and MaxEnt) based on the IAPS distribution in the forest genetic resource reserve (2,274ha) in Uljin-gun, Korea. We then selected the realistic and suitable species distribution model that reflects the local region and ecological management characteristics based on the simulation results. The simulation predicted the tendency of the IAPS distributed along the linear landscape elements, such as roads, and including some forest harvested area. The statistical comparison of the prediction and accuracy of each model tested in this study showed that the GLM and MaxEnt models generally had high performance and accuracy compared to the Bioclim model. The Bioclim model calculated the largest potential distribution area, followed by GLM and MaxEnt in that order. The Phenomenological review of the simulation results showed that the sample size more significantly affected the GLM and Bioclim models, while the MaxEnt model was the most consistent regardless of the sample size. The optimal model overall for predicting the distribution of IAPS among the three models was the MaxEnt model. The model selection approach based on detailed flora distribution data presented in this study is expected to be useful for efficiently managing the conservation areas and identifying the realistic and precise species distribution model reflecting local characteristics.

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

  • Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.147-170
    • /
    • 2022
  • Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.