• 제목/요약/키워드: random pattern

검색결과 605건 처리시간 0.023초

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • 제24권1호
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Comparision of Medical Care Utilization Patterns between Beneficiaries of Medical Aid and Medical Insurance (의료보호대상자의 의료이용양상)

  • Kim, Bok-Youn;Kim, Seok-Beom;Kim, Chang-Yoon;Kang, Pock-Soo;Chung, Jong-Hak
    • Journal of Yeungnam Medical Science
    • /
    • 제8권2호
    • /
    • pp.185-201
    • /
    • 1991
  • A household survey was conducted to compare the patterns of morbidity and medical care utilization between medical aid beneficiaries and medical insurance beneficiaries. The study population included 285 medical aid beneficiaries that were completely surveyed and 386 medical insurance benficiaries selected by simple random sampling from a Dong(Township) in Taegu. Well-trained surveyers mainly interviewed housewives with a structured questionnaire. The morbidity rates of acute illness during the 15-day period, were 63 per 1,000 medical aid beneficiaries and 62 per 1,000 medical insurance beneficiaries. The rates for chronic illness were 123 per 1,000 medical aid beneficiaries and 73 per 1,000 medical insurance beneficiaries. The most common type of acute illness in medical aid and medical insurance beneficiaries was respiratory disease. In medical aid beneficiaries, musculoskeletal disease was most common, but in medical insurance beneficiaries, gastrointestinal disease was most common. The mean duration of acute illness of medical aid beneficiaries was 3.8 days and that of medical insurance beneficiaries was 6.8 days. During the one year period, mean duration of medical aid beneficiaries chronic illnesses was 11.5 months which was almost twice as long compared to medical insurance beneficiaries. Pharmacy was most preferrable facility among the acute illness patient in medical aid beneficiaries, but acute cases of medical insurance beneficiaries visited the clinic most commonly. Chronic cases of both groups visited the clinic most frequently. There were some findings suggesting that much unmet need existed among the medical aid beneficiaries. In acute cases, the average number of days of medical aid users utilized medical facilities was less than medical insurance users. On the other hand, the length of medical care utilization of chronic cases was reversed. Geographical accessibility was the most important factors in utilization of medical facilities. Almost half of the study population answered the questions about source of funds on medical security correctly. Most respondents considered that the objective of medical security was afford ability. The chief complaint on hospital utilization was the complicated administrative procedures. These findings suggest that there were some problems in the medical aid system, especially in the referral system.

  • PDF

Comparison of Imposed Work of Breathing Between Pressure-Triggered and Flow-Triggered Ventilation During Mechanical Ventilation (기계환기시 압력유발법과 유량유발법 차이에 의한 부가적 호흡일의 비교)

  • Choi, Jeong-Eun;Lim, Chae-Man;Koh, Youn-Suck;Lee, Sang-Do;Kim, Woo-Sung;Kim, Dong-Soon;Kim, Won-Dong
    • Tuberculosis and Respiratory Diseases
    • /
    • 제44권3호
    • /
    • pp.592-600
    • /
    • 1997
  • Background : The level of imposed work of breathing (WOB) is important for patient-ventilator synchrony and during weaning from mechanical ventilation. Triggering methods and the sensitivity of demand system are important determining factors of the imposed WOB. Flow triggering method is available on several modern ventilator and is believed to impose less work to a patient-triggered breath than pressure triggering method. We intended to compare the level of imposed WOB on two different methods of triggering and also at different levels of sensitivities on each triggering method (0.7 L/min vs 2.0 L/min on flow triggering ; $-1\;cmH_2O$ vs $-2cm\;H_2O$ on pressure triggering). Methods : The subjects were 12 patients ($64.8{\pm}4.2\;yrs$) on mechanical ventilation and were stable in respiratory pattern on CPAP $3\;cmH_2O$. Four different triggering sensitivities were applied at random order. For determination of imposed WOB, tracheal end pressure was measured through the monitoring lumen of Hi-Lo Jet tracheal tube (Mallincrodt, New York, USA) using pneumotachograph/pressure transducer (CP-100 pulmonary monitor, Bicore, Irvine, CA, USA). Other data of respiratory mechanics were also obtained by CP-100 pulmonary monitor. Results : The imposed WOB was decreased by 37.5% during 0.7 L/min on flow triggering compared to $-2\;cmH_2O$ on pressure triggering and also decreased by 14% during $-1\;cmH_2O$ compared to $-2\;cmH_2O$ on pressure triggering (p < 0.05 in each). The PTP(Pressure Time Product) was also decreased significantly during 0.7 L/min on flow triggering and $-1\;cmH_2O$ on pressure triggering compared to $-2\;cmH_2O$ on pressure triggering (p < 0.05 in each). The proportions of imposed WOB in total WOB were ranged from 37% to 85% and no significant changes among different methods and sensitivities. The physiologic WOB showed no significant changes among different triggering methods and sensitivities. Conclusion : To reduce the imposed WOB, flow triggering with sensitivity of 0.7 L/min would be better method than pressure triggering with sensitivity of $-2\;cm\;H_2O$.

  • PDF

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • 제18권2호
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Preliminary Report of the $1998{\sim}1999$ Patterns of Care Study of Radiation Therapy for Esophageal Cancer in Korea (식도암 방사선 치료에 대한 Patterns of Care Study ($1998{\sim}1999$)의 예비적 결과 분석)

  • Hur, Won-Joo;Choi, Young-Min;Lee, Hyung-Sik;Kim, Jeung-Kee;Kim, Il-Han;Lee, Ho-Jun;Lee, Kyu-Chan;Kim, Jung-Soo;Chun, Mi-Son;Kim, Jin-Hee;Ahn, Yong-Chan;Kim, Sang-Gi;Kim, Bo-Kyung
    • Radiation Oncology Journal
    • /
    • 제25권2호
    • /
    • pp.79-92
    • /
    • 2007
  • [ $\underline{Purpose}$ ]: For the first time, a nationwide survey in the Republic of Korea was conducted to determine the basic parameters for the treatment of esophageal cancer and to offer a solid cooperative system for the Korean Pattern of Care Study database. $\underline{Materials\;and\;Methods}$: During $1998{\sim}1999$, biopsy-confirmed 246 esophageal cancer patients that received radiotherapy were enrolled from 23 different institutions in South Korea. Random sampling was based on power allocation method. Patient parameters and specific information regarding tumor characteristics and treatment methods were collected and registered through the web based PCS system. The data was analyzed by the use of the Chi-squared test. $\underline{Results}$: The median age of the collected patients was 62 years. The male to female ratio was about 91 to 9 with an absolute male predominance. The performance status ranged from ECOG 0 to 1 in 82.5% of the patients. Diagnostic procedures included an esophagogram (228 patients, 92.7%), endoscopy (226 patients, 91.9%), and a chest CT scan (238 patients, 96.7%). Squamous cell carcinoma was diagnosed in 96.3% of the patients; mid-thoracic esophageal cancer was most prevalent (110 patients, 44.7%) and 135 patients presented with clinical stage III disease. Fifty seven patients received radiotherapy alone and 37 patients received surgery with adjuvant postoperative radiotherapy. Half of the patients (123 patients) received chemotherapy together with RT and 70 patients (56.9%) received it as concurrent chemoradiotherapy. The most frequently used chemotherapeutic agent was a combination of cisplatin and 5-FU. Most patients received radiotherapy either with 6 MV (116 patients, 47.2%) or with 10 MV photons (87 patients, 35.4%). Radiotherapy was delivered through a conventional AP-PA field for 206 patients (83.7%) without using a CT plan and the median delivered dose was 3,600 cGy. The median total dose of postoperative radiotherapy was 5,040 cGy while for the non-operative patients the median total dose was 5,970 cGy. Thirty-four patients received intraluminal brachytherapy with high dose rate Iridium-192. Brachytherapy was delivered with a median dose of 300 cGy in each fraction and was typically delivered $3{\sim}4\;times$. The most frequently encountered complication during the radiotherapy treatment was esophagitis in 155 patients (63.0%). $\underline{Conclusion}$: For the evaluation and treatment of esophageal cancer patients at radiation facilities in Korea, this study will provide guidelines and benchmark data for the solid cooperative systems of the Korean PCS. Although some differences were noted between institutions, there was no major difference in the treatment modalities and RT techniques.