• Title/Summary/Keyword: In-Sample Prediction

Search Result 556, Processing Time 0.028 seconds

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.

DEVELOPMENT OF SAFETY-BASED LEVEL-OF-SERVICE CRITERIA FOR ISOLATED SIGNALIZED INTERSECTIONS (독립신호 교차로에서의 교통안전을 위한 서비스수준 결정방법의 개발)

  • Dr. Tae-Jun Ha
    • Proceedings of the KOR-KST Conference
    • /
    • 1995.02a
    • /
    • pp.3-32
    • /
    • 1995
  • The Highway Capacity Manual specifies procedures for evaluating intersection performance in terms of delay per vehicle. What is lacking in the current methodology is a comparable quantitative procedure for ass~ssing the safety-based level of service provided to motorists. The objective of the research described herein was to develop a computational procedure for evaluating the safety-based level of service of signalized intersections based on the relative hazard of alternative intersection designs and signal timing plans. Conflict opportunity models were developed for those crossing, diverging, and stopping maneuvers which are associated with left-turn and rear-end accidents. Safety¬based level-of-service criteria were then developed based on the distribution of conflict opportunities computed from the developed models. A case study evaluation of the level of service analysis methodology revealed that the developed safety-based criteria were not as sensitive to changes in prevailing traffic, roadway, and signal timing conditions as the traditional delay-based measure. However, the methodology did permit a quantitative assessment of the trade-off between delay reduction and safety improvement. The Highway Capacity Manual (HCM) specifies procedures for evaluating intersection performance in terms of a wide variety of prevailing conditions such as traffic composition, intersection geometry, traffic volumes, and signal timing (1). At the present time, however, performance is only measured in terms of delay per vehicle. This is a parameter which is widely accepted as a meaningful and useful indicator of the efficiency with which an intersection is serving traffic needs. What is lacking in the current methodology is a comparable quantitative procedure for assessing the safety-based level of service provided to motorists. For example, it is well¬known that the change from permissive to protected left-turn phasing can reduce left-turn accident frequency. However, the HCM only permits a quantitative assessment of the impact of this alternative phasing arrangement on vehicle delay. It is left to the engineer or planner to subjectively judge the level of safety benefits, and to evaluate the trade-off between the efficiency and safety consequences of the alternative phasing plans. Numerous examples of other geometric design and signal timing improvements could also be given. At present, the principal methods available to the practitioner for evaluating the relative safety at signalized intersections are: a) the application of engineering judgement, b) accident analyses, and c) traffic conflicts analysis. Reliance on engineering judgement has obvious limitations, especially when placed in the context of the elaborate HCM procedures for calculating delay. Accident analyses generally require some type of before-after comparison, either for the case study intersection or for a large set of similar intersections. In e.ither situation, there are problems associated with compensating for regression-to-the-mean phenomena (2), as well as obtaining an adequate sample size. Research has also pointed to potential bias caused by the way in which exposure to accidents is measured (3, 4). Because of the problems associated with traditional accident analyses, some have promoted the use of tqe traffic conflicts technique (5). However, this procedure also has shortcomings in that it.requires extensive field data collection and trained observers to identify the different types of conflicts occurring in the field. The objective of the research described herein was to develop a computational procedure for evaluating the safety-based level of service of signalized intersections that would be compatible and consistent with that presently found in the HCM for evaluating efficiency-based level of service as measured by delay per vehicle (6). The intent was not to develop a new set of accident prediction models, but to design a methodology to quantitatively predict the relative hazard of alternative intersection designs and signal timing plans.

  • PDF

Spatial Distribution Patterns and Prediction of Hotspot Area for Endangered Herpetofauna Species in Korea (국내 멸종위기양서·파충류의 공간적 분포형태와 주요 분포지역 예측에 대한 연구)

  • Do, Min Seock;Lee, Jin-Won;Jang, Hoan-Jin;Kim, Dae-In;Park, Jinwoo;Yoo, Jeong-Chil
    • Korean Journal of Environment and Ecology
    • /
    • v.31 no.4
    • /
    • pp.381-396
    • /
    • 2017
  • Understanding species distribution plays an important role in conservation as well as evolutionary biology. In this study, we applied a species distribution model to predict hotspot areas and habitat characteristics for endangered herpetofauna species in South Korea: the Korean Crevice Salamander (Karsenia koreana), Suweon-tree frog (Hyla suweonensis), Gold-spotted pond frog (Pelophylax chosenicus), Narrow-mouthed toad (Kaloula borealis), Korean ratsnake (Elaphe schrenckii), Mongolian racerunner (Eremias argus), Reeve's turtle (Mauremys reevesii) and Soft-shelled turtle (Pelodiscus sinensis). The Kori salamander (Hynobius yangi) and Black-headed snake (Sibynophis chinensis) were excluded from the analysis due to insufficient sample size. The results showed that the altitude was the most important environmental variable for their distribution, and the altitude at which these species were distributed correlated with the climate of that region. The predicted distribution area derived from the species distribution modelling adequately reflected the observation site used in this study as well as those reported in preceding studies. The average AUC value of the eigh species was relatively high ($0.845{\pm}0.08$), while the average omission rate value was relatively low ($0.087{\pm}0.01$). Therefore, the species overlaying model created for the endangered species is considered successful. When merging the distribution models, it was shown that five species shared their habitats in the coastal areas of Gyeonggi-do and Chungcheongnam-do, which are the western regions of the Korean Peninsula. Therefore, we suggest that protection should be a high priority in these area, and our overall results may serve as essential and fundamental data for the conservation of endangered amphibian and reptiles in Korea.

Studies on Discrimination between Organic Rice and Non-organic Rice using Natural Abundance of Stable Isotope Nitrogen($\delta^{15}N$) (질소 안정동위원소 자연존재비($\delta^{15}N$)를 이용한 유기벼와 일반벼 판별법 탐색)

  • Lee, Hyo-Won;Lee, Sang-Mo
    • Korean Journal of Organic Agriculture
    • /
    • v.18 no.2
    • /
    • pp.257-269
    • /
    • 2010
  • To investigate the possibility of discrimination between organic and non-organic rice using stable isotope nitrogen of natural abundance, organic rice of 17 samples and non-organic rice of 13 samples grown at adjoining organic rice field were collected in 2008. Rice was grinded into brown rice, milled rice and hull, and samples were analysed for nitrogen and $\delta^{15}N$ at NICEM. Authors also made inquiries about N source for both farmers who conduct organic- and non-organic rice cultivation. In order to know whether the $\delta^{15}N$ can be used in discrimination between organic and non-organic rice, discriminant analysis were made with SPSS and logistic method. 1. Organic farmers used manure, rice bran, used mushroom culture, fermented fertilizer (company products), and oil cake, but non-organic farmers applied compound fertilizer. Rice straws were remained in organic rice field while moved out in non-organic field. 2. There were difference in $\delta^{15}N$ among organic rice and its byproduct(7.760????% in hull, 6.720????% in rice), but significant difference was not found between them. And the trend was same between province. Non-organic rice showed similar results. 3. Significant difference of $\delta^{15}N$ were found between organic rice and non-organic rice (p<0.01) and between hull of organic rice and that of non-organic rice hull (p<0.05). $\delta^{15}N$ seemed to be useful criteria for discrimination of organic and non-organic rice. 4. When applied discrimination analysis of SPSS and logistic, there were significant difference between organic rice, non-organic rice and its byproducts except brown rice and hull in SPSS method. Hull can be used as the most useful component for unknown sample prediction with 83.3% probability.

Determinants of IPO Failure Risk and Price Response in Kosdaq (코스닥 상장 시 실패위험 결정요인과 주가반응에 관한 연구)

  • Oh, Sung-Bae;Nam, Sam-Hyun;Yi, Hwa-Deuk
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.5 no.4
    • /
    • pp.1-34
    • /
    • 2010
  • Recently, failure rates of Kosdaq IPO firms are increasing and their survival rates tend to be very low, and when these firms do fail, often times backed by a number of governmental financial supports, they may inflict severe financial damage to investors, let alone economy as a whole. To ensure investors' confidence in Kosdaq and foster promising and healthy businesses, it is necessary to precisely assess their intrinsic values and survivability. This study investigates what contributed to the failure of IPO firms and analyzed how these elements are factored into corresponding firms' stock returns. Failure risks are assessed at the time of IPO. This paper considers factors reflecting IPO characteristics, a firm's underwriter prestige, auditor's quality, IPO offer price, firm's age, and IPO proceeds. The study further went on to examine how, if at all, these failure risks involved during IPO led to post-IPO stock prices. Sample firms used in this study include 98 Kosdaq firms that have failed and 569 healthy firms that are classified into the same business categories, and Logit models are used in estimate the probability of failure. Empirical results indicate that auditor's quality, IPO offer price, firm's age, and IPO proceeds shown significant relevance to failure risks at the time of IPO. Of other variables, firm's size and ROA, previously deemed significantly related to failure risks, in fact do not show significant relevance to those risks, whereas financial leverage does. This illustrates the efficacy of a model that appropriately reflects the attributes of IPO firms. Also, even though R&D expenditures were believed to be value relevant by previous studies, this study reveals that R&D is not a significant factor related to failure risks. In examing the relation between failure risks and stock prices, this study finds that failure risks are negatively related to 1 or 2 year size-adjusted abnormal returns after IPO. The results of this study may provide useful knowledge for government regulatory officials in contemplating pertinent policy and for credit analysts in their proper evaluation of a firm's credit standing.

  • PDF

Sequencing analysis of the OFC1 gene on the nonsyndromic cleft lip and palate patient in Korean (한국인 비증후군성 구순구개열 환자의 OFC1 유전자의 서열 분석)

  • Kim, Sung-Sik;Son, Woo-Sung
    • The korean journal of orthodontics
    • /
    • v.33 no.3 s.98
    • /
    • pp.185-197
    • /
    • 2003
  • This study was performed to identify the characteristics of the OFC1 gene (locus: chromosome 6p24.3) in Korean patients, which is assumed to be the major gene behind the nonsyndromic cleft lip and palate. The sample consisted of 80 subjects: 40 nonsyndromic cleft lip and palate patients (proband, 20 males and females, mean age 14.2 years); and 40 normal adults (20 males and 20 females, mean age 25.6 years). Using PCR-based assay, the OFC1 gene was amplified, sequenced, and then searched for similar protein structures. Results were as follows: 1. The OFC1 gene contains the microsatellite marker 'CA' repeats. The number of the reference 'CA' repeats was 21 times, and formed as TA(CA)11TA(CA)10. But, in Koreans, the number of tandem 'CA' repeats was varied from 17 to 26 except 18, and 'CA' repeats consisted of TA(CA)n. 2. Nine allelic variants were found. Distribution of the OFC1 allele was similar between the patients and control group. 3. There was a replacement of the base 'T' to 'C' after 11 tandem 'CA' repeats in Koreans compared with Weissenbach's report. However, the difference did not seem to be the ORF prediction results between Koreans and Weissenbach's report. 4. The BLAST search results showed the Telomerase reverse transcriptase (TERT) and the Nucleotide binding protein 2 (NBP2) as similar proteins. The TERT was a protein product by the hTERT gene in the locus 5p15.33 (NCBI Genome Annotation; NT023089) The NBP2 was a protein product by the ABCC3 (ATP-binding cassette, sub-family C) gene in the locus 17q22 (NCBI Genome Annotation; NT010783). 5. In the Pedant-Pro database analysis, the predictable protein structure of the OFC1 gene had at least one transmembrane region and one non-globular region.

Nose Changes after Maxillary Advancement Surgery in Skeletal Class III Malocclusion (골격성 III급 부정교합자에서 상악골 전방 이동술 후 코의 변화에 관한 연구)

  • Kang, Eun-Hee;Park, Soo-Byung;Kim, Jong-Ryoul
    • The korean journal of orthodontics
    • /
    • v.30 no.5 s.82
    • /
    • pp.657-668
    • /
    • 2000
  • The purpose of this study was to evaluate the amount and interrelationship of the soft tissue of nose and maxillary changes and to identify the nasal morphologic features that indicate susceptibility to nasal deflection in such a manner that they would be useful in presurgical prediction of nasal changes after maxillary advancement surgery in skeletal Class III malocclusion. The sample consisted of 25 adult patients (13 males and 12 females) who had severe anteroposterior skeletal discrepancy. The patients had received presurgical orthodontic treatment. They underwent a Le Fort I advancement osteotomy, rigid internal fixation, alar cinch suture and V-Y advancement lip closure. The presurgical and postsurgical lateral cephalograms and lateral and frontal facial photographs were evaluated. The computerized statistical analysis was carried out. Soft tissue of nose change to h point change ratios were calculated by regression equations. The results were as follows 1. The correlation of maxillary hard tissue horizontal changes and nasal soft tissue vortical changes were high and the ${\beta}_0$ for soft tissue to ADV were 0.228 at ANt, 0.257 at SNt. 2. The correlation of maxillary hard tissue and nasal soft tissue horizontal changes were high and the ${\beta}_0$ for soft tissue to ADV were 0.484 at ANt, 0.431 at SNt, 0.806 at Sn. 3. The correlation of maxillary hard tissue horizontal changes and width changes of ala of nose were high and the ${\beta}_0$ lot alar base width ratio to ADV were 0.002. 4. The DRI, Prominence of nose, Pre-Op CA is not a quantitative measure that can be used clinically to improve the predictability of vertical and horizontal nasal tip deflection. In this study, increases in nasal tip projection and anterosuperior rotation occur when there is an anterior vector of maxillary movement. These nasal changes were Quantitatively correlated to magnitude of maxillary(A point) movement.

  • PDF

Comparison of Size Criteria in Mediastinal Lymph Node Involvement of Adenocarcinoma of Lungs (폐 선암의 종격동 림프절 전이에 있어서 림프절 크기 기준의 비교)

  • Gu, Ki-Seon;Kuk, Hiang;Koh, Hyeck-Jae;Yang, Sei-Hun;Jeong, Eun-Taik
    • Tuberculosis and Respiratory Diseases
    • /
    • v.46 no.4
    • /
    • pp.542-547
    • /
    • 1999
  • Background: Decision in mediastinal lymph node involvement of lung cancer by CT scan is very important and valuable for the treatment planning and prognosis prediction. In general, long diameter of mediastinal lymph node more than 15mm is used as criterion of lung cancer involvement. Adenocarci-noma has a tendency of early distant metastasis and micrometastasis, so adenocarcinoma may involve lymph node earlier and cannot be detected before lymph nodes are enlarged enough. The authors tried to determine the difference between two size criteria(15mm, 10mm) in adenocarcinoma for the detection of cancer involvement. Methods: Numbers of sample are 60 cases(male 46, female 14, median age: 61.5 years). According to pathology, squamous cancer 41, large cell cancer 2, adenocarcinoma 17. According to TNM stage, I 23, III 24, IIIA 13. Results : Mean long diameter of lymph node involvement is 16.0($\pm8.0$) mm in non-adenocarcinoma group, and that of adenocarcinoma group is 12.0($\pm3.2$) mm(p<0.05). If long diameter of lymph node larger than 15mm as involvement criterion is applied, sensitivity, specificity, positive predictive index, negative predictive index, accuracy of nonadenocarcinoma group are 54%, 100%, 100%, 83%, 86%, and those of adenocarcinoma group are 43%, 90%, 75%, 69%, 71%. If long diameter of lymph node larger than 10mm as involvement criterion is applied, sensitivity, specificity, positive predictive index. negative predictive index. accuracy of nonadenocarcinoma group are 65%, 77%, 61%, 92%, 79%, and those of adenocarcinoma group are 100%, 80%, 78%, 100%, 88%. Conclusion: Long diameter of lymph node larger than 10mm is more valuable criterion as lymph node involvement in adenocarcinoma of lungs.

  • PDF

Development of Stand Yield Table Based on Current Growth Characteristics of Chamaecyparis obtusa Stands (현실임분 생장특성에 의한 편백 임분수확표 개발)

  • Jung, Su Young;Lee, Kwang Soo;Lee, Ho Sang;Ji Bae, Eun;Park, Jun Hyung;Ko, Chi-Ung
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.4
    • /
    • pp.477-483
    • /
    • 2020
  • We constructed a stand yield table for Chamaecyparis obtusa based on data from an actual forest. The previous stand yield table had a number of disadvantages because it was based on actual forest information. In the present study we used data from more than 200 sampling plots in a stand of Chamaecyparis obtusa. The analysis included theestimation, recovery and prediction of the distribution of values for diameter at breast height (DBH), and the result is a valuable process for the preparation ofstand yield tables. The DBH distribution model uses a Weibull function, and the site index (base age: 30 years), the standard for assessing forest productivity, was derived using the Chapman-Richards formula. Several estimation formulas for the preparation of the stand yield table were considered for the fitness index, and the optimal formula was chosen. The analysis shows that the site index is in the range of 10 to 18 in the Chamaecyparis obtusa stand. The estimated stand volume of each sample plot was found to have an accuracy of 62%. According to the residuals analysis, the stands showed even distribution around zero, which indicates that the results are useful in the field. Comparing the table constructed in this study to the existing stand yield table, we found that our table yielded comparatively higher values for growth. This is probably because the existing analysis data used a small amount of research data that did not properly reflect. We hope that the stand yield table of Chamaecyparis obtusa, a representative species of southern regions, will be widely used for forest management. As these forests stabilize and growth progresses, we plan to construct an additional yield table applicable to the production of developed stands.

The Clinical Utility of Korean Bayley Scales of Infant and Toddler Development-III - Focusing on using of the US norm - (베일리영유아발달검사 제3판(Bayley-III)의 미국 규준 적용의 문제: 미숙아 집단을 대상으로)

  • Lim, Yoo Jin;Bang, Hee Jeong;Lee, Soonhang
    • Korean journal of psychology:General
    • /
    • v.36 no.1
    • /
    • pp.81-107
    • /
    • 2017
  • The study aims to investigate the clinical utility of Bayley-III using US norm in Korea. A total of 98 preterm infants and 93 term infants were assessed with the K-Bayley-III. The performance pattern of preterm infants was analyzed with mixed design ANOVA which examined the differences of scaled scores and composite scores of Bayley-III between full term- and preterm- infant group and within preterm infants group. Then, We have investigated agreement between classifications of delay made using the BSID-II and Bayley-III. In addition, ROC plots were constructed to identify a Bayley-III cut-off score with optimum diagnostic utility in this sample. The results were as follows. (1) Preterm infants have significantly lower function levels in areas of 5 scaled scores and 3 developmental indexes compared with infants born at term. Significant differences among scores within preterm infant group were also found. (2) Bayley-III had the higher scores of the Mental Development Index and Psychomotor Developmental Index comparing to the scores of K-BSID-II, and had the lower rates of developmental delay. (3) All scales of Bayley-III, Cognitive, Language and Motor scale had the appropriate level of discrimination, but the cut-off composite scores of Bayley-III were adjusted 13~28 points higher than 69 for prediction of delay, as defined by the K-BSID-II. It explains the lower rates of developmental delay using the standard of two standard deviation. This study has provided empirical data to inform that we must careful when interpreting the score for clinical applications, identified the discriminating power, and proposed more appropriate cut-off scores. In addition, discussion about the sampling for making the Korean norm of Bayley-III was provided. It is preferable that infants in Korea should use our own validated norms. The standardization process to get Korean normative data must be performed carefully.