• Title/Summary/Keyword: Out-of-Sample Prediction

Search Result 91, Processing Time 0.027 seconds

Forecasting KOSPI Return Using a Modified Stochastic AdaBoosting

  • Bae, Sangil;Jeong, Minsoo
    • East Asian Economic Review
    • /
    • v.25 no.4
    • /
    • pp.403-424
    • /
    • 2021
  • AdaBoost tweaks the sample weight for each training set used in the iterative process, however, it is demonstrated that it provides more correlated errors as the boosting iteration proceeds if models' accuracy is high enough. Therefore, in this study, we propose a novel way to improve the performance of the existing AdaBoost algorithm by employing heterogeneous models and a stochastic twist. By employing the heterogeneous ensemble, it ensures different models that have a different initial assumption about the data are used to improve on diversity. Also, by using a stochastic algorithm with a decaying convergence rate, the model is designed to balance out the trade-off between model prediction performance and model convergence. The result showed that the stochastic algorithm with decaying convergence rate's did have a improving effect and outperformed other existing boosting techniques.

A Prediction Model for TVOC and HCHO Emission of Paint Materials (페인트에서 방출되는 TVOC 및 HCHO 방출량 예측모델)

  • Kim, Hyung-Soo;Lee, Kyung-Hoi
    • KIEAE Journal
    • /
    • v.3 no.1
    • /
    • pp.13-20
    • /
    • 2003
  • It is highly recognized that there is need for protection against indoor air pollution, as we realize environmental pollution is growing, For example, in an indoor environment, a person spends more than 80 percent of their time inside the building. Thus, concern about indoor decoration materials is growing, since they cause pollution in the rooms of an apartment, as well as in offices. As the indoor decoration materials become more diverse and lusurious, so the effect of VOCs(Volatile Organic Compounds) and HCHO(Formaldehy) is growing. The indoor decoration materials cause the Sick Building Syndrome, such as headaches, dizziness, or lack of concentraion, and they in turn cause serious deterioration in people's health. In this study, I probed the status of the indoor air pollution and carried on an investigation and analysis about the prevention technique. In doing so, I performed experimental tests and an assessment of the indoor decoration materials of an apartment. I also examined elements of the emitted and the emission. Finally, I examined the character of emissions, by changing environmental conditions, such as the temperature, humidity, and ventilation. With respect to VOCs tests, I applied the method of solid state adsorption using the adsorptive tube, based on the measurement of the American EPA TO-17, ASTM 5116-97, and the measurement of the Japanese Wall Decoration Industrial Association. The tested sample was analyzed by High Performance Liquid Chromatography, after going through the process of dissolvent extraction. As subjects of the test, Paint were selected. The process of this test is as follows; first, I figured out the character of the emission, by measuring the emitted concentration of VOCs and HOHC from the indoor decoration materials of an apartment. Second, I made a small-scale chamber and the test was processed in the chamber in order to suggest an environment-friendly prediction modlel development.

Estimating the unconfined compression strength of low plastic clayey soils using gene-expression programming

  • Muhammad Naqeeb Nawaz;Song-Hun Chong;Muhammad Muneeb Nawaz;Safeer Haider;Waqas Hassan;Jin-Seop Kim
    • Geomechanics and Engineering
    • /
    • v.33 no.1
    • /
    • pp.1-9
    • /
    • 2023
  • The unconfined compression strength (UCS) of soils is commonly used either before or during the construction of geo-structures. In the pre-design stage, UCS as a mechanical property is obtained through a laboratory test that requires cumbersome procedures and high costs from in-situ sampling and sample preparation. As an alternative way, the empirical model established from limited testing cases is used to economically estimate the UCS. However, many parameters affecting the 1D soil compression response hinder employing the traditional statistical analysis. In this study, gene expression programming (GEP) is adopted to develop a prediction model of UCS with common affecting soil properties. A total of 79 undisturbed soil samples are collected, of which 54 samples are utilized for the generation of a predictive model and 25 samples are used to validate the proposed model. Experimental studies are conducted to measure the unconfined compression strength and basic soil index properties. A performance assessment of the prediction model is carried out using statistical checks including the correlation coefficient (R), the root mean square error (RMSE), the mean absolute error (MAE), the relatively squared error (RSE), and external criteria checks. The prediction model has achieved excellent accuracy with values of R, RMSE, MAE, and RSE of 0.98, 10.01, 7.94, and 0.03, respectively for the training data and 0.92, 19.82, 14.56, and 0.15, respectively for the testing data. From the sensitivity analysis and parametric study, the liquid limit and fine content are found to be the most sensitive parameters whereas the sand content is the least critical parameter.

Multi-dimensional Analysis and Prediction Model for Tourist Satisfaction

  • Shrestha, Deepanjal;Wenan, Tan;Gaudel, Bijay;Rajkarnikar, Neesha;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.480-502
    • /
    • 2022
  • This work assesses the degree of satisfaction tourists receive as final recipients in a tourism destination based on the fact that satisfied tourists can make a significant contribution to the growth and continuous improvement of a tourism business. The work considers Pokhara, the tourism capital of Nepal as a prefecture of study. A stratified sampling methodology with open-ended survey questions is used as a primary source of data for a sample size of 1019 for both international and domestic tourists. The data collected through a survey is processed using a data mining tool to perform multi-dimensional analysis to discover information patterns and visualize clusters. Further, supervised machine learning algorithms, kNN, Decision tree, Support vector machine, Random forest, Neural network, Naive Bayes, and Gradient boost are used to develop models for training and prediction purposes for the survey data. To find the best model for prediction purposes, different performance matrices are used to evaluate a model for performance, accuracy, and robustness. The best model is used in constructing a learning-enabled model for predicting tourists as satisfied, neutral, and unsatisfied visitors. This work is very important for tourism business personnel, government agencies, and tourism stakeholders to find information on tourist satisfaction and factors that influence it. Though this work was carried out for Pokhara city of Nepal, the study is equally relevant to any other tourism destination of similar nature.

Measurement of lipid content of compost fermentation using near-infrared spectroscopy

  • Daisuke Masui;Suehara, Ken-ichiro;Yasuhisa Nakano;Takuo Yano
    • Near Infrared Analysis
    • /
    • v.2 no.1
    • /
    • pp.37-42
    • /
    • 2001
  • Near infrared spectroscopy (NIRS) was applied to determination of the lipid content of the compost during the compost fermentation of tofu (soybean0curd) refuse. The absorption of lipid observed at 5 wavelengths, 1208, 1712, 1772, 2312 and 2352 nm on the second derivative spectra. To formulated a calibration equation, a multiple linear regression analysis was carried out between the near-infrared spectral data and on the lipid content in the calibration sample set (sample number, n=60) obtained using Soxhlet extraction method. The value of the multiple correlation coefficient (R) was 0.975 when using the wavelengths of 1208 and 1712 nm were used in the calibration equation. To validate the calibration equation obtained, the lipid content in the validation sample set (n=35) not used for formulating the calibration equation was calculated using the calibration equation, and compared with the value obtained using the Soxhlet extraction method. Good agreement was observed between the results of the Soxhlet extraction method and those values of the NIRS method. The simple correlation coefficient (r) and standard error of prediction (SEP) were 0.964 and 0.815 %, respectively. suitability of the lipid content as an indicator of the compost fermentation of tofu refuse was also studied. The decrease of the lipid content in the compost corresponded to the decrease of the total dry weight of the compost in the composter. The lipid content was a significant indicator of the compost fermentation. The NIRS method was applied to measure the time course of the lipid content in the compost fermentation and good results were obtained. The study indicates that NIRS is a useful method for process management of the compost fermentation of tofu refuse.

The Weathering Index and Prediction of Uniaxial Compressive Strength for Chung-Ju Granite (충주 지역 화강암의 풍화지수 및 일축압축강도 추정에 관한 연구)

  • Eom, Tae-Uk;Kim, Hak-Mun;Kim, Chan-Kuk;Jang, Kyung-Jun;Pyo, Myung-Ryul
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2008.03a
    • /
    • pp.863-874
    • /
    • 2008
  • We have to judge engineering properties of rock accurately in order to design and construct rock structure safely and economically. Among the rock tests, the test result of UCS(Uniaxial Compressive Strength) is very important factor used in the variety ways for designing and construction of underground structures, rock slope and foundation analysis. But the UCS test has some disadvantages of intact sample preparation such as because the shape of sample has to be regular cylindrical, cube or rectangular. In order to solve those problem, indirect tests are used such as point load test, schmidt hammer test, absorption test, dry density to predict UCS of rock. Those tests are easy to prepare sample and convenient to carry out the tests, so it is simple and costs less. Schmidt hammer test are frequently used in the construction site, because it is handy and easy to use, but there is concern of misuse without classifying the specification of each schmidt hammer. Thus, this study suggested presumptive numerical formula related on each specification of schmidt hammer test, point load test, absorption test and dry density also. We compared presumptive numerical formula and R-square through schmidt rebound assessment method already brought up. Also, through the test we offer the extent of weathering index according to the weathering grade.

  • PDF

Dynamic Nonlinear Prediction Model of Univariate Hydrologic Time Series Using the Support Vector Machine and State-Space Model (Support Vector Machine과 상태공간모형을 이용한 단변량 수문 시계열의 동역학적 비선형 예측모형)

  • Kwon, Hyun-Han;Moon, Young-Il
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.3B
    • /
    • pp.279-289
    • /
    • 2006
  • The reconstruction of low dimension nonlinear behavior from the hydrologic time series has been an active area of research in the last decade. In this study, we present the applications of a powerful state space reconstruction methodology using the method of Support Vector Machines (SVM) to the Great Salt Lake (GSL) volume. SVMs are machine learning systems that use a hypothesis space of linear functions in a Kernel induced higher dimensional feature space. SVMs are optimized by minimizing a bound on a generalized error (risk) measure, rather than just the mean square error over a training set. The utility of this SVM regression approach is demonstrated through applications to the short term forecasts of the biweekly GSL volume. The SVM based reconstruction is used to develop time series forecasts for multiple lead times ranging from the period of two weeks to several months. The reliability of the algorithm in learning and forecasting the dynamics is tested using split sample sensitivity analyses, with a particular interest in forecasting extreme states. Unlike previously reported methodologies, SVMs are able to extract the dynamics using only a few past observed data points (Support Vectors, SV) out of the training examples. Considering statistical measures, the prediction model based on SVM demonstrated encouraging and promising results in a short-term prediction. Thus, the SVM method presented in this study suggests a competitive methodology for the forecast of hydrologic time series.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

Predicting Economic Activity via the Yield Spread: Literature Survey and Empirical Evidence in Korea (이자율 스프레드의 경기 예측력: 문헌 서베이 및 한국의 사례 분석)

  • Yun, Jaeho
    • Economic Analysis
    • /
    • v.26 no.3
    • /
    • pp.1-47
    • /
    • 2020
  • This paper surveys research since the 1990s on the ability of the yield spread and its components (i.e., expectation spread and term premium components) for future economic activity, and also conducts an empirical analysis of their forecasting ability using the yield data of Korean government bonds. This paper's survey, particularly for the US, shows that the yield spread has significant predictive power for some macroeconomic variables, but since the mid-1980s, its predictive power seems to have declined, possibly due to stronger inflation targeting. Next, this paper's empirical analysis using Korean data indicates that the yield spread, and the term premium component in particular, has significant predictive power for industrial production (IP) growth, consumer price index growth, and the IP gap. An out-of-sample analysis shows that the prediction equations are unstable over time, and that in predicting IP growth, the yield spread decomposition makes a significant contribution to the prediction of IP growth.

The Prediction of Blending Ratio of Cut Tobacco, Expanded Stem, and Expanded Cut Tobacco in Cigarettes using Near Infrared Spectroscopy (근적외분광법을 이용한 권련 중 일반각초, 팽화주맥 및 팽화각초 배합비 분석)

  • 김용옥;정한주;김기환
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.22 no.1
    • /
    • pp.76-83
    • /
    • 2000
  • This study was carried out to predict blending ratio of cut tobacco(CT), expanded stem(ES), and expanded cut tobacco(ECT) in cigarettes. CT, ES, and ECT samples from A brand were, ground and blended with reference to A blending ratio, and scanned by near infrared spectroscopy(NIRSystem Co., Model 6500). Calibration equations were developed and then determined blending ratio by NIRS. The standard error of calibration(SEC) and performance(SEP) of C factory samples between NIRS and known blending ratio were 0.97%, 1.93% for CT, 0.50%, 1.12 % for ES and 0.68%, 1.10% for ECT, respectively. The SEP of CT, ES and ECT of Band D factory samples determined by C factory calibration equation were more inaccurate than those of C factory samples determined by C factory calibration equations. These results were caused by the difference of CT, ES and ECT spectra followed by each factory. The SEP of CT, ES and ECT of Band D factories determined by calibration equations derived from each factory samples were more accurate than those of determined by calibration equation derived from C factory samples. Each factory SEP of CT, ES and ECT determined by calibration equation derived from all calibration samples(B+C+D factory) was similar to that determined by calibration equation derived from each factory samples. To improve the analytical inaccuracy caused by spectra difference, we need to apply a specific calibration equation for each factory sample. Data in development of specific calibrations between sample and NIRS spectra might supply a method for rapid determination of blending ratio of CT, ES, and ECT.

  • PDF