• Title/Summary/Keyword: multiple regression techniques

Search Result 256, Processing Time 0.025 seconds

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.

Analysis of Ski Socialization Process of Undergraduates (대학생의 스키사회화 과정 분석)

  • Song, Kang-Young;Kim, Kyong-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.9
    • /
    • pp.167-175
    • /
    • 2007
  • The purposes of this study were to examine the effect of interest and participation in ski, encouragement to participate in ski, and attitude to ski of significant others on the socialization. To attain the goal of the study described above paragraphs, the ski class participant of the university located in Seoul, 2005 year were set as a collected group. Then, using the cluster random sampling method, finally drew out and analyzed 200 people in total. The result of reliability check up was here below; over Chronbach's $\alpha$=.600. To analyze materials, confirmatory factor analysis, logistic regression analysis and multiple regression analysis were used as statistic analysis techniques. The conclusion based on above study method and the result of material analysis are below. Based upon the result of the study, the following conclusions appear warranted: First, significant others influence on the ski participation experience of the undergraduate. Second, significant others influence on the ski participation frequencies of the undergraduate. Third, significant others influence on the ski participation period of the undergraduate. Fourth, significant others influence on the ski participation skill of the undergraduate.

Analysis of Important Indicators of TCB Using GBM (일반화가속모형을 이용한 기술신용평가 주요 지표 분석)

  • Jeon, Woo-Jeong(Michael);Seo, Young-Wook
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.159-173
    • /
    • 2017
  • In order to provide technical financial support to small and medium-sized venture companies based on technology, the government implemented the TCB evaluation, which is a kind of technology rating evaluation, from the Kibo and a qualified private TCB. In this paper, we briefly review the current state of TCB evaluation and available indicators related to technology evaluation accumulated in the Korea Credit Information Services (TDB), and then use indicators that have a significant effect on the technology rating score. Multiple regression techniques will be explored. And the relative importance and classification accuracy of the indicators were calculated by applying the key indicators as independent features applied to the generalized boosting model, which is a representative machine learning classifier, as the class influence and the fitness of each model. As a result of the analysis, it was analyzed that the relative importance between the two models was not significantly different. However, GBM model had more weight on the InnoBiz certification, R&D department, patent registration and venture confirmation indicators than regression model.

Development of Water Level Prediction Models Using Deep Neural Network in Mountain Wetlands (딥러닝을 활용한 산지습지 수위 예측 모형 개발)

  • Kim, Donghyun;Kim, Jungwook;Kwak, Jaewon;Necesito, Imee V.;Kim, Jongsung;Kim, Hung Soo
    • Journal of Wetlands Research
    • /
    • v.22 no.2
    • /
    • pp.106-112
    • /
    • 2020
  • Wetlands play an important function and role in hydrological, environmental, and ecological, aspects of the watershed. Water level in wetlands is essential for various analysis such as for the determination of wetland function and its effects on the environment. Since several wetlands are ungauged, research on wetland water level prediction are uncommon. Therefore, this study developed a water level prediction model using multiple regression analysis, principal component regression analysis, artificial neural network, and DNN to predict wetland water level. Geumjeong-Mountain Wetland located in Yangsan-city, Gyeongsangnam-do province was selected as the target area, and the water level measurement data from April 2017 to July 2018 was used as the dependent variable. On the other hand, hydrological and meteorological data were used as independent variables in the study. As a result of evaluating the predictive power, the water level prediction model using DNN was selected as the final model as it showed an RMSE value of 6.359 and an NRMSE value of 18.91%. This research study is believed to be useful especially as a basic data for the development of wetland maintenance and management techniques using the water level of the existing unmeasured points.

The Effects of Ecological Variables on Volunteering among Older Adults: The Applications of General Ecological Theory of Aging (노인자원봉사활동에 있어서 생태환경 변수의 효과: 노화의 일반생태학 이론을 적용하여)

  • Lee, Hyunkee
    • 한국노년학
    • /
    • v.32 no.3
    • /
    • pp.777-800
    • /
    • 2012
  • This paper aims to estimate the effects of environmental variables on volunteering among older persons and decide relationships between independent and dependent variables. The thesis conceptually points out that the integrated theory of resources too much emphasizes the important roles of human, social and cultural capital, but overlooks the influences of ecological environments in explaining volunteering among the older persons. And the thesis tries to apply the general ecological theory of aging to explaining volunteering of older people together with resource frameworks, and to estimate the effects of ecological environment variables on volunteerism for senior citizens. Using a micro data of 2009 National Social Survey by Statistics Korea, the paper screens out 10,268 subjects who are believed to socially retire and be above 55 years older. The multiple OLS regression and binomial logistic regression techniques are used to estimate the effects of ecological environments and resources on volunteering. The analysis results show that all of environmental and resource variables are related to volunteering at the level of p<.000. This means that environmental variables have independent effects on the volunteerism, controlling for resource variables. This results suggest that both theories have empirical evidences in explaining volunteerism in Korea. Also, at the end of paper, theoretical and policy implications for practices and future studies are discussed.

Water Quality Assessment and Turbidity Prediction Using Multivariate Statistical Techniques: A Case Study of the Cheurfa Dam in Northwestern Algeria

  • ADDOUCHE, Amina;RIGHI, Ali;HAMRI, Mehdi Mohamed;BENGHAREZ, Zohra;ZIZI, Zahia
    • Applied Chemistry for Engineering
    • /
    • v.33 no.6
    • /
    • pp.563-573
    • /
    • 2022
  • This work aimed to develop a new equation for turbidity (Turb) simulation and prediction using statistical methods based on principal component analysis (PCA) and multiple linear regression (MLR). For this purpose, water samples were collected monthly over a five year period from Cheurfa dam, an important reservoir in Northwestern Algeria, and analyzed for 12 parameters, including temperature (T°), pH, electrical conductivity (EC), turbidity (Turb), dissolved oxygen (DO), ammonium (NH4+), nitrate (NO3-), nitrite (NO2-), phosphate (PO43-), total suspended solids (TSS), biochemical oxygen demand (BOD5) and chemical oxygen demand (COD). The results revealed a strong mineralization of the water and low dissolved oxygen (DO) content during the summer period. High levels of TSS and Turb were recorded during rainy periods. In addition, water was charged with phosphate (PO43-) in the whole period of study. The PCA results revealed ten factors, three of which were significant (eigenvalues >1) and explained 75.5% of the total variance. The F1 and F2 factors explained 36.5% and 26.7% of the total variance, respectively and indicated anthropogenic pollution of domestic agricultural and industrial origin. The MLR turbidity simulation model exhibited a high coefficient of determination (R2 = 92.20%), indicating that 92.20% of the data variability can be explained by the model. TSS, DO, EC, NO3-, NO2-, and COD were the most significant contributing parameters (p values << 0.05) in turbidity prediction. The present study can help with decision-making on the management and monitoring of the water quality of the dam, which is the primary source of drinking water in this region.

Convolution Neural Network for Prediction of DNA Length and Number of Species (DNA 길이와 혼합 종 개수 예측을 위한 합성곱 신경망)

  • Sunghee Yang;Yeone Kim;Hyomin Lee
    • Korean Chemical Engineering Research
    • /
    • v.62 no.3
    • /
    • pp.274-280
    • /
    • 2024
  • Machine learning techniques utilizing neural networks have been employed in various fields such as disease gene discovery and diagnosis, drug development, and prediction of drug-induced liver injury. Disease features can be investigated by molecular information of DNA. In this study, we developed a neural network to predict the length of DNA and the number of DNA species in mixture solution which are representative molecular information of DNA. In order to address the time-consuming limitations of gel electrophoresis as conventional analysis, we analyzed the dynamic data of a microfluidic concentrating device. The dynamic data were reconstructed into a spatiotemporal map, which reduced the computational cost required for training and prediction. We employed a convolutional neural network to enhance the accuracy to analyze the spatiotemporal map. As a result, we successfully performed single DNA length prediction as single-variable regression, simultaneous prediction of multiple DNA lengths as multivariable regression, and prediction of the number of DNA species in mixture as binary classification. Additionally, based on the composition of training data, we proposed a solution to resolve the problem of prediction bias. By utilizing this study, it would be effectively performed that medical diagnosis using optical measurement such as liquid biopsy of cell-free DNA, cancer diagnosis, etc.

Development and Application of Imputation Technique Based on NPR for Missing Traffic Data (NPR기반 누락 교통자료 추정기법 개발 및 적용)

  • Jang, Hyeon-Ho;Han, Dong-Hui;Lee, Tae-Gyeong;Lee, Yeong-In;Won, Je-Mu
    • Journal of Korean Society of Transportation
    • /
    • v.28 no.3
    • /
    • pp.61-74
    • /
    • 2010
  • ITS (Intelligent transportation systems) collects real-time traffic data, and accumulates vest historical data. But tremendous historical data has not been managed and employed efficiently. With the introduction of data management systems like ADMS (Archived Data Management System), the potentiality of huge historical data dramatically surfs up. However, traffic data in any data management system includes missing values in nature, and one of major obstacles in applying these data has been the missing data because it makes an entire dataset useless every so often. For these reasons, imputation techniques take a key role in data management systems. To address these limitations, this paper presents a promising imputation technique which could be mounted in data management systems and robustly generates the estimations for missing values included in historical data. The developed model, based on NPR (Non-Parametric Regression) approach, employs various traffic data patterns in historical data and is designated for practical requirements such as the minimization of parameters, computational speed, the imputation of various types of missing data, and multiple imputation. The model was tested under the conditions of various missing data types. The results showed that the model outperforms reported existing approaches in the side of prediction accuracy, and meets the computational speed required to be mounted in traffic data management systems.

Predicting Forest Gross Primary Production Using Machine Learning Algorithms (머신러닝 기법의 산림 총일차생산성 예측 모델 비교)

  • Lee, Bora;Jang, Keunchang;Kim, Eunsook;Kang, Minseok;Chun, Jung-Hwa;Lim, Jong-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.29-41
    • /
    • 2019
  • Terrestrial Gross Primary Production (GPP) is the largest global carbon flux, and forest ecosystems are important because of the ability to store much more significant amounts of carbon than other terrestrial ecosystems. There have been several attempts to estimate GPP using mechanism-based models. However, mechanism-based models including biological, chemical, and physical processes are limited due to a lack of flexibility in predicting non-stationary ecological processes, which are caused by a local and global change. Instead mechanism-free methods are strongly recommended to estimate nonlinear dynamics that occur in nature like GPP. Therefore, we used the mechanism-free machine learning techniques to estimate the daily GPP. In this study, support vector machine (SVM), random forest (RF) and artificial neural network (ANN) were used and compared with the traditional multiple linear regression model (LM). MODIS products and meteorological parameters from eddy covariance data were employed to train the machine learning and LM models from 2006 to 2013. GPP prediction models were compared with daily GPP from eddy covariance measurement in a deciduous forest in South Korea in 2014 and 2015. Statistical analysis including correlation coefficient (R), root mean square error (RMSE) and mean squared error (MSE) were used to evaluate the performance of models. In general, the models from machine-learning algorithms (R = 0.85 - 0.93, MSE = 1.00 - 2.05, p < 0.001) showed better performance than linear regression model (R = 0.82 - 0.92, MSE = 1.24 - 2.45, p < 0.001). These results provide insight into high predictability and the possibility of expansion through the use of the mechanism-free machine-learning models and remote sensing for predicting non-stationary ecological processes such as seasonal GPP.

Determination of shear wave velocity profiles in soil deposit from seismic piezo-cone penetration test (탄성파 피에조콘 관입 시험을 통한 국내 퇴적 지반의 전단파 속도 결정)

  • Sun Chung Guk;Jung Gyungja;Jung Jong Hong;Kim Hong-Jong;Cho Sung-Min
    • 한국지구물리탐사학회:학술대회논문집
    • /
    • 2005.09a
    • /
    • pp.125-153
    • /
    • 2005
  • It has been widely known that the seismic piezo-cone penetration test (SCPTU) is one of the most useful techniques for investigating the geotechnical characteristics including dynamic soil properties. As the practical applications in Korea, SCPTU was carried out at two sites in Busan and four sites in Incheon, which are mainly composed of alluvial or marine soil deposits. From the SCPTU waveform data obtained from the testing sites, the first arrival times of shear waves were and the corresponding time differences with depth were determined using the cross-over method, and the shear wave velocity profiles (VS) were derived based on the refracted ray path method based on Snell's law and similar to the trend of cone tip resistance (qt) profiles. In Incheon area, the testing depths of SCPTU were deeper than those of conventional down-hole seismic tests. Moreover, for the application of the conventional CPTU to earthquake engineering practices, the correlations between VS and CPTU data were deduced based on the SCPTU results. For the empirical evaluation of VS for all soils together with clays and sands which are classified unambiguously in this study by the soil behavior type classification Index (IC), the authors suggested the VS-CPTU data correlations expressed as a function of four parameters, qt, fs, $\sigma$, v0 and Bq, determined by multiple statistical regression modeling. Despite the incompatible strain levels of the down-hole seismic test during SCPTU and the conventional CPTU, it is shown that the VS-CPTU data correlations for all soils clays and sands suggested in this study is applicable to the preliminary estimation of VS for the Korean deposits and is more reliable than the previous correlations proposed by other researchers.

  • PDF